Source of the materials: Biopython cookbook (adapted) Status: Draft

Accessing NCBI’s Entrez databases

Entrez Guidelines

EInfo: Obtaining information about the Entrez databases

ESearch: Searching the Entrez databases

EPost: Uploading a list of identifiers

EFetch: Downloading full records from Entrez

History and WebEnv

Specialized parsers


Entrez ( is a data retrieval system that provides users access to NCBI’s databases such as PubMed, GenBank, GEO, and many others. You can access Entrez from a web browser to manually enter queries, or you can use Biopython’s Bio.Entrez module for programmatic access to Entrez. The latter allows you for example to search PubMed or download GenBank records from within a Python script.

The Bio.Entrez module makes use of the Entrez Programming Utilities (also known as EUtils), consisting of eight tools that are described in detail on NCBI’s page at Each of these tools corresponds to one Python function in the Bio.Entrez module, as described in the sections below. This module makes sure that the correct URL is used for the queries, and that not more than one request is made every three seconds, as required by NCBI.

The output returned by the Entrez Programming Utilities is typically in XML format. To parse such output, you have several options:

  1. Use Bio.Entrez’s parser to parse the XML output into a Python object;

  2. Use the DOM (Document Object Model) parser in Python’s standard library;

  3. Use the SAX (Simple API for XML) parser in Python’s standard library;

  4. Read the XML output as raw text, and parse it by string searching and manipulation.

For the DOM and SAX parsers, see the Python documentation. The parser in Bio.Entrez is discussed below.

NCBI uses DTD (Document Type Definition) files to describe the structure of the information contained in XML files. Most of the DTD files used by NCBI are included in the Biopython distribution. The Bio.Entrez parser makes use of the DTD files when parsing an XML file returned by NCBI Entrez.

Occasionally, you may find that the DTD file associated with a specific XML file is missing in the Biopython distribution. In particular, this may happen when NCBI updates its DTD files. If this happens, will show a warning message with the name and URL of the missing DTD file. The parser will proceed to access the missing DTD file through the internet, allowing the parsing of the XML file to continue. However, the parser is much faster if the DTD file is available locally. For this purpose, please download the DTD file from the URL in the warning message and place it in the directory, containing the other DTD files. If you don’t have write access to this directory, you can also place the DTD file in ~/.biopython/Bio/Entrez/DTDs, where ~ represents your home directory. Since this directory is read before the directory, you can also put newer versions of DTD files there if the ones in become outdated. Alternatively, if you installed Biopython from source, you can add the DTD file to the source code’s Bio/Entrez/DTDs directory, and reinstall Biopython. This will install the new DTD file in the correct location together with the other DTD files.

The Entrez Programming Utilities can also generate output in other formats, such as the Fasta or GenBank file formats for sequence databases, or the MedLine format for the literature database, discussed in Section Specialized parsers.

Entrez Guidelines

Before using Biopython to access the NCBI’s online resources (via Bio.Entrez or some of the other modules), please read the NCBI’s Entrez User Requirements. If the NCBI finds you are abusing their systems, they can and will ban your access!

To paraphrase:

  • For any series of more than 100 requests, do this at weekends or outside USA peak times. This is up to you to obey.

  • Use the address, not the standard NCBI Web address. Biopython uses this web address.

  • Make no more than three requests every seconds (relaxed from at most one request every three seconds in early 2009). This is automatically enforced by Biopython.

  • Use the optional email parameter so the NCBI can contact you if there is a problem. You can either explicitly set this as a parameter with each call to Entrez (e.g. include email=“” in the argument list), or you can set a global email address:

In [1]:
from Bio import Entrez = ""

Bio.Entrez will then use this email address with each call to Entrez. The address is a reserved domain name specifically for documentation (RFC 2606). Please DO NOT use a random email – it’s better not to give an email at all. The email parameter will be mandatory from June 1, 2010. In case of excessive usage, NCBI will attempt to contact a user at the e-mail address provided prior to blocking access to the E-utilities.

If you are using Biopython within some larger software suite, use the tool parameter to specify this. You can either explicitly set the tool name as a parameter with each call to Entrez (e.g. include tool=“MyLocalScript” in the argument list), or you can set a global tool name:

In [2]:
from Bio import Entrez
Entrez.tool = "MyLocalScript"
The tool parameter will default to Biopython.

  • For large queries, the NCBI also recommend using their session history feature (the WebEnv session cookie string, see Section History and WebEnv). This is only slightly more complicated.

In conclusion, be sensible with your usage levels. If you plan to download lots of data, consider other options. For example, if you want easy access to all the human genes, consider fetching each chromosome by FTP as a GenBank file, and importing these into your own BioSQL database (see Section [sec:BioSQL]).

EInfo: Obtaining information about the Entrez databases

EInfo provides field index term counts, last update, and available links for each of NCBI’s databases. In addition, you can use EInfo to obtain a list of all database names accessible through the Entrez utilities. The variable result now contains a list of databases in XML format:

In [3]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.einfo()
result =

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD einfo 20130322//EN" "">



Since this is a fairly simple XML file, we could extract the information it contains simply by string searching. Using Bio.Entrez’s parser instead, we can directly parse this XML file into a Python object:

In [4]:
from Bio import Entrez
handle = Entrez.einfo()
record =

Now record is a dictionary with exactly one key:

In [5]:


The values stored in this key is the list of database names shown in the XML above:

In [6]:

['pubmed', 'protein', 'nuccore', 'nucleotide', 'nucgss', 'nucest', 'structure', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'clone', 'gap', 'gapplus', 'grasp', 'dbvar', 'epigenomics', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'probe', 'proteinclusters', 'pcassay', 'biosystems', 'pccompound', 'pcsubstance', 'pubmedhealth', 'seqannot', 'snp', 'sra', 'taxonomy', 'unigene', 'gencoll', 'gtr']

For each of these databases, we can use EInfo again to obtain more information:

In [7]:
from Bio import Entrez
handle = Entrez.einfo(db="pubmed")
record =

'PubMed bibliographic record'

In [8]:

dict_keys(['LastUpdate', 'Count', 'DbName', 'Description', 'MenuName', 'FieldList', 'DbBuild', 'LinkList'])

In [9]:
handle = Entrez.einfo(db="pubmed")
record =

'PubMed bibliographic record'

In [10]:


In [11]:

'2016/01/12 18:56'

Try record["DbInfo"].keys() for other information stored in this record. One of the most useful is a list of possible search fields for use with ESearch:

In [12]:
for field in record["DbInfo"]["FieldList"]:
    print("%(Name)s, %(FullName)s, %(Description)s" % field)

ALL, All Fields, All terms from all searchable fields
UID, UID, Unique number assigned to publication
FILT, Filter, Limits the records
TITL, Title, Words in title of publication
WORD, Text Word, Free text associated with publication
MESH, MeSH Terms, Medical Subject Headings assigned to publication
MAJR, MeSH Major Topic, MeSH terms of major importance to publication
AUTH, Author, Author(s) of publication
JOUR, Journal, Journal abbreviation of publication
AFFL, Affiliation, Author's institutional affiliation and address
ECNO, EC/RN Number, EC number for enzyme or CAS registry number
SUBS, Supplementary Concept, CAS chemical name or MEDLINE Substance Name
PDAT, Date - Publication, Date of publication
EDAT, Date - Entrez, Date publication first accessible through Entrez
VOL, Volume, Volume number of publication
PAGE, Pagination, Page number(s) of publication
PTYP, Publication Type, Type of publication (e.g., review)
LANG, Language, Language of publication
ISS, Issue, Issue number of publication
SUBH, MeSH Subheading, Additional specificity for MeSH term
SI, Secondary Source ID, Cross-reference from publication to other databases
MHDA, Date - MeSH, Date publication was indexed with MeSH terms
TIAB, Title/Abstract, Free text associated with Abstract/Title
OTRM, Other Term, Other terms associated with publication
INVR, Investigator, Investigator
COLN, Author - Corporate, Corporate Author of publication
CNTY, Place of Publication, Country of publication
PAPX, Pharmacological Action, MeSH pharmacological action pre-explosions
GRNT, Grant Number, NIH Grant Numbers
MDAT, Date - Modification, Date of last modification
CDAT, Date - Completion, Date of completion
PID, Publisher ID, Publisher ID
FAUT, Author - First, First Author of publication
FULL, Author - Full, Full Author Name(s) of publication
FINV, Investigator - Full, Full name of investigator
TT, Transliterated Title, Words in transliterated title of publication
LAUT, Author - Last, Last Author of publication
PPDT, Print Publication Date, Date of print publication
EPDT, Electronic Publication Date, Date of Electronic publication
LID, Location ID, ELocation ID
CRDT, Date - Create, Date publication first accessible through Entrez
BOOK, Book, ID of the book that contains the document
ED, Editor, Section's Editor
PUBN, Publisher, Publisher's name
AUCL, Author Cluster ID, Author Cluster ID
EID, Extended PMID, Extended PMID
DSO, DSO, Additional text from the summary
AUID, Author - Identifier, Author Identifier
PS, Subject - Personal Name, Personal Name as Subject

That’s a long list, but indirectly this tells you that for the PubMed database, you can do things like Jones[AUTH] to search the author field, or Sanger[AFFL] to restrict to authors at the Sanger Centre. This can be very handy - especially if you are not so familiar with a particular database.

ESearch: Searching the Entrez databases

To search any of these databases, we use Bio.Entrez.esearch(). For example, let’s search in PubMed for publications related to Biopython:

In [13]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.esearch(db="pubmed", term="biopython")
record =

['24929426', '24497503', '24267035', '24194598', '23842806', '23157543', '22909249', '22399473', '21666252', '21210977', '20015970', '19811691', '19773334', '19304878', '18606172', '21585724', '16403221', '16377612', '14871861', '14630660']

In [14]:

{'TranslationStack': [{'Explode': 'N', 'Term': 'biopython[All Fields]', 'Field': 'All Fields', 'Count': '21'}, 'GROUP'], 'IdList': ['24929426', '24497503', '24267035', '24194598', '23842806', '23157543', '22909249', '22399473', '21666252', '21210977', '20015970', '19811691', '19773334', '19304878', '18606172', '21585724', '16403221', '16377612', '14871861', '14630660'], 'TranslationSet': [], 'QueryTranslation': 'biopython[All Fields]', 'Count': '21', 'RetStart': '0', 'RetMax': '20'}

In this output, you see seven PubMed IDs (including 19304878 which is the PMID for the Biopython application), which can be retrieved by EFetch (see section EFetch: Downloading full records from Entrez).

You can also use ESearch to search GenBank. Here we’ll do a quick search for the matK gene in Cypripedioideae orchids (see Section [sec:entrez-einfo] about EInfo for one way to find out which fields you can search in each Entrez database):

In [15]:
handle = Entrez.esearch(db="nucleotide", term="Cypripedioideae[Orgn] AND matK[Gene]")
record =


In [16]:

['844174433', '937957673', '694174838', '944541375', '575524123', '575524121', '575524119', '575524117', '575524115', '575524113', '575524111', '575524109', '575524107', '575524105', '575524103', '575524101', '575524099', '575524097', '575524095', '575524093']

Each of the IDs (126789333, 37222967, 37222966, …) is a GenBank identifier. See section EFetch: Downloading full records from Entrez for information on how to actually download these GenBank records.

Note that instead of a species name like Cypripedioideae[Orgn], you can restrict the search using an NCBI taxon identifier, here this would be txid158330[Orgn]. This isn’t currently documented on the ESearch help page - the NCBI explained this in reply to an email query. You can often deduce the search term formatting by playing with the Entrez web interface. For example, including complete[prop] in a genome search restricts to just completed genomes.

As a final example, let’s get a list of computational journal titles:

In [17]:
# nlmcatalog
# handle = Entrez.esearch(db="nlmcatalog", term="computational")
# record =
# record["Count"]
handle = Entrez.esearch(db="nlmcatalog", term="biopython[Journal]", RetMax='20')
record =
print("{} computational Journals found".format(record["Count"]))
print("The first 20 are\n{}".format(record['IdList']))

0 computational Journals found
The first 20 are

Again, we could use EFetch to obtain more information for each of these journal IDs.

ESearch has many useful options — see the ESearch help page for more information.

EPost: Uploading a list of identifiers

EPost uploads a list of UIs for use in subsequent search strategies; see the EPost help page for more information. It is available from Biopython through the function.

To give an example of when this is useful, suppose you have a long list of IDs you want to download using EFetch (maybe sequences, maybe citations – anything). When you make a request with EFetch your list of IDs, the database etc, are all turned into a long URL sent to the server. If your list of IDs is long, this URL gets long, and long URLs can break (e.g. some proxies don’t cope well).

Instead, you can break this up into two steps, first uploading the list of IDs using EPost (this uses an “HTML post” internally, rather than an “HTML get”, getting round the long URL problem). With the history support, you can then refer to this long list of IDs, and download the associated data with EFetch.

Let’s look at a simple example to see how EPost works – uploading some PubMed identifiers:

In [18]:
from Bio import Entrez = ""     # Always tell NCBI who you are
id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"]
print("pubmed", id=",".join(id_list)).read())

<?xml version="1.0"?>
<!DOCTYPE ePostResult PUBLIC "-//NLM//DTD ePostResult, 11 May 2002//EN" "">

The returned XML includes two important strings, QueryKey and WebEnv which together define your history session. You would extract these values for use with another Entrez call such as EFetch:

In [19]:
from Bio import Entrez = ""     # Always tell NCBI who you are
id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"]
search_results ="pubmed", id=",".join(id_list)))
webenv = search_results["WebEnv"]
query_key = search_results["QueryKey"]

Section History and WebEnv shows how to use the history feature.

ESummary: Retrieving summaries from primary IDs

ESummary retrieves document summaries from a list of primary IDs (see the ESummary help page for more information). In Biopython, ESummary is available as Bio.Entrez.esummary(). Using the search result above, we can for example find out more about the journal with ID 30367:

In [20]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.esummary(db="nlmcatalog", term="[journal]", id="101660833")
record =
info = record[0]['TitleMainList'][0]
print("Journal info\nid: {}\nTitle: {}".format(record[0]["Id"], info["Title"]))

Journal info
id: 101660833
Title: IEEE transactions on computational imaging.

EFetch: Downloading full records from Entrez

EFetch is what you use when you want to retrieve a full record from Entrez. This covers several possible databases, as described on the main EFetch Help page.

For most of their databases, the NCBI support several different file formats. Requesting a specific file format from Entrez using Bio.Entrez.efetch() requires specifying the rettype and/or retmode optional arguments. The different combinations are described for each database type on the pages linked to on NCBI efetch webpage (e.g. literature, sequences and taxonomy).

One common usage is downloading sequences in the FASTA or GenBank/GenPept plain text formats (which can then be parsed with Bio.SeqIO, see Sections [sec:SeqIO_GenBank_Online] and EFetch: Downloading full records from Entrez). From the Cypripedioideae example above, we can download GenBank record 186972394 using Bio.Entrez.efetch:

In [21]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.efetch(db="nucleotide", id="186972394", rettype="gb", retmode="text")

LOCUS       EU490707                1302 bp    DNA     linear   PLN 15-JAN-2009
DEFINITION  Selenipedium aequinoctiale maturase K (matK) gene, partial cds;
VERSION     EU490707.1  GI:186972394
SOURCE      chloroplast Selenipedium aequinoctiale
  ORGANISM  Selenipedium aequinoctiale
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; Liliopsida; Asparagales; Orchidaceae;
            Cypripedioideae; Selenipedium.
REFERENCE   1  (bases 1 to 1302)
  AUTHORS   Neubig,K.M., Whitten,W.M., Carlsward,B.S., Blanco,M.A., Endara,L.,
            Williams,N.H. and Moore,M.
  TITLE     Phylogenetic utility of ycf1 in orchids: a plastid gene more
            variable than matK
  JOURNAL   Plant Syst. Evol. 277 (1-2), 75-84 (2009)
REFERENCE   2  (bases 1 to 1302)
  AUTHORS   Neubig,K.M., Whitten,W.M., Carlsward,B.S., Blanco,M.A.,
            Endara,C.L., Williams,N.H. and Moore,M.J.
  TITLE     Direct Submission
  JOURNAL   Submitted (14-FEB-2008) Department of Botany, University of
            Florida, 220 Bartram Hall, Gainesville, FL 32611-8526, USA
FEATURES             Location/Qualifiers
     source          1..1302
                     /organism="Selenipedium aequinoctiale"
                     /mol_type="genomic DNA"
                     /specimen_voucher="FLAS:Blanco 2475"
     gene            <1..>1302
     CDS             <1..>1302
                     /product="maturase K"
        1 attttttacg aacctgtgga aatttttggt tatgacaata aatctagttt agtacttgtg
       61 aaacgtttaa ttactcgaat gtatcaacag aattttttga tttcttcggt taatgattct
      121 aaccaaaaag gattttgggg gcacaagcat tttttttctt ctcatttttc ttctcaaatg
      181 gtatcagaag gttttggagt cattctggaa attccattct cgtcgcaatt agtatcttct
      241 cttgaagaaa aaaaaatacc aaaatatcag aatttacgat ctattcattc aatatttccc
      301 tttttagaag acaaattttt acatttgaat tatgtgtcag atctactaat accccatccc
      361 atccatctgg aaatcttggt tcaaatcctt caatgccgga tcaaggatgt tccttctttg
      421 catttattgc gattgctttt ccacgaatat cataatttga atagtctcat tacttcaaag
      481 aaattcattt acgccttttc aaaaagaaag aaaagattcc tttggttact atataattct
      541 tatgtatatg aatgcgaata tctattccag tttcttcgta aacagtcttc ttatttacga
      601 tcaacatctt ctggagtctt tcttgagcga acacatttat atgtaaaaat agaacatctt
      661 ctagtagtgt gttgtaattc ttttcagagg atcctatgct ttctcaagga tcctttcatg
      721 cattatgttc gatatcaagg aaaagcaatt ctggcttcaa agggaactct tattctgatg
      781 aagaaatgga aatttcatct tgtgaatttt tggcaatctt attttcactt ttggtctcaa
      841 ccgtatagga ttcatataaa gcaattatcc aactattcct tctcttttct ggggtatttt
      901 tcaagtgtac tagaaaatca tttggtagta agaaatcaaa tgctagagaa ttcatttata
      961 ataaatcttc tgactaagaa attcgatacc atagccccag ttatttctct tattggatca
     1021 ttgtcgaaag ctcaattttg tactgtattg ggtcatccta ttagtaaacc gatctggacc
     1081 gatttctcgg attctgatat tcttgatcga ttttgccgga tatgtagaaa tctttgtcgt
     1141 tatcacagcg gatcctcaaa aaaacaggtt ttgtatcgta taaaatatat acttcgactt
     1201 tcgtgtgcta gaactttggc acggaaacat aaaagtacag tacgcacttt tatgcgaaga
     1261 ttaggttcgg gattattaga agaattcttt atggaagaag aa

The arguments rettype="gb" and retmode="text" let us download this record in the GenBank format.

Note that until Easter 2009, the Entrez EFetch API let you use “genbank” as the return type, however the NCBI now insist on using the official return types of “gb” or “gbwithparts” (or “gp” for proteins) as described on online. Also not that until Feb 2012, the Entrez EFetch API would default to returning plain text files, but now defaults to XML.

Alternatively, you could for example use rettype="fasta" to get the Fasta-format; see the EFetch Sequences Help page for other options. Remember – the available formats depend on which database you are downloading from - see the main EFetch Help page.

If you fetch the record in one of the formats accepted by Bio.SeqIO (see Chapter [chapter:Bio.SeqIO]), you could directly parse it into a SeqRecord:

In [22]:
from Bio import Entrez, SeqIO
handle = Entrez.efetch(db="nucleotide", id="186972394", rettype="gb", retmode="text")
record =, "genbank")

ID: EU490707.1
Name: EU490707
Description: Selenipedium aequinoctiale maturase K (matK) gene, partial cds; chloroplast.
Number of features: 3
/taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'Liliopsida', 'Asparagales', 'Orchidaceae', 'Cypripedioideae', 'Selenipedium']
/references=[Reference(title='Phylogenetic utility of ycf1 in orchids: a plastid gene more variable than matK', ...), Reference(title='Direct Submission', ...)]
/organism=Selenipedium aequinoctiale
/source=chloroplast Selenipedium aequinoctiale

Note that a more typical use would be to save the sequence data to a local file, and then parse it with Bio.SeqIO. This can save you having to re-download the same file repeatedly while working on your script, and places less load on the NCBI’s servers. For example:

In [23]:
import os
from Bio import SeqIO
from Bio import Entrez = ""     # Always tell NCBI who you are
filename = "gi_186972394.gbk"
if not os.path.isfile(filename):
    # Downloading...
    with Entrez.efetch(db="nucleotide",id="186972394",rettype="gb", retmode="text") as net_handle:
        with open(filename, "w") as out_handle:

record =, "genbank")

AttributeError                            Traceback (most recent call last)
<ipython-input-23-d6d046a39755> in <module>()
      6 if not os.path.isfile(filename):
      7     # Downloading...
----> 8     with Entrez.efetch(db="nucleotide",id="186972394",rettype="gb", retmode="text") as net_handle:
      9         with open(filename, "w") as out_handle:
     10             out_handle.write(

AttributeError: __exit__

To get the output in XML format, which you can parse using the function, use retmode="xml":

In [24]:
from Bio import Entrez
handle = Entrez.efetch(db="nucleotide", id="186972394", retmode="xml")
record =

'Selenipedium aequinoctiale maturase K (matK) gene, partial cds; chloroplast'

In [25]:

'chloroplast Selenipedium aequinoctiale'

So, that dealt with sequences. For examples of parsing file formats specific to the other databases (e.g. the MEDLINE format used in PubMed), see Section Specialized parsers.

If you want to perform a search with Bio.Entrez.esearch(), and then download the records with Bio.Entrez.efetch(), you should use the WebEnv history feature – see Section History and WebEnv.

ELink, available from Biopython as Bio.Entrez.elink(), can be used to find related items in the NCBI Entrez databases. For example, you can us this to find nucleotide entries for an entry in the gene database, and other cool stuff.

Let’s use ELink to find articles related to the Biopython application note published in Bioinformatics in 2009. The PubMed ID of this article is 19304878:

In [26]:
from Bio import Entrez = ""
pmid = "19304878"
record ="pubmed", id=pmid))
print('The record is from the {} database.'.format(record[0]["DbFrom"]))
print('The IdList is {}.'.format(record[0]["IdList"]))

dict_keys(['ERROR', 'LinkSetDbHistory', 'IdList', 'LinkSetDb', 'DbFrom'])
The record is from the pubmed database.
The IdList is ['19304878'].

The record variable consists of a Python list, one for each database in which we searched. Since we specified only one PubMed ID to search for, record contains only one item. This item is a dictionary containing information about our search term, as well as all the related items that were found:

The "LinkSetDb" key contains the search results, stored as a list consisting of one item for each target database. In our search results, we only find hits in the PubMed database (although sub-divided into categories):

In [27]:
print('There are {} search results'.format(len(record[0]["LinkSetDb"])))
for linksetdb in record[0]["LinkSetDb"]:
    print(linksetdb["DbTo"], linksetdb["LinkName"], len(linksetdb["Link"]))

There are 8 search results
pubmed pubmed_pubmed 224
pubmed pubmed_pubmed_alsoviewed 3
pubmed pubmed_pubmed_citedin 276
pubmed pubmed_pubmed_combined 6
pubmed pubmed_pubmed_five 6
pubmed pubmed_pubmed_refs 17
pubmed pubmed_pubmed_reviews 8
pubmed pubmed_pubmed_reviews_five 6

The actual search results are stored as under the "Link" key. In total, 110 items were found under standard search. Let’s now at the first search result:

In [28]:

{'Id': '19304878'}

This is the article we searched for, which doesn’t help us much, so let’s look at the second search result:

In [29]:

{'Id': '14630660'}

This paper, with PubMed ID 14630660, is about the Biopython PDB parser.

We can use a loop to print out all PubMed IDs:

In [30]:
for link in record[0]["LinkSetDb"][0]["Link"]:


Now that was nice, but personally I am often more interested to find out if a paper has been cited. Well, ELink can do that too – at least for journals in Pubmed Central (see Section [sec:elink-citations]).

For help on ELink, see the ELink help page. There is an entire sub-page just for the link names, describing how different databases can be cross referenced.

EGQuery: Global Query - counts for search terms

EGQuery provides counts for a search term in each of the Entrez databases (i.e. a global query). This is particularly useful to find out how many items your search terms would find in each database without actually performing lots of separate searches with ESearch (see the example in [subsec:entrez_example_genbank] below).

In this example, we use Bio.Entrez.egquery() to obtain the counts for “Biopython”:

In [31]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.egquery(term="biopython")
record =
for row in record["eGQueryResult"]:
    print(row["DbName"], row["Count"])

pubmed 21
pmc 560
mesh 0
books 2
pubmedhealth 2
omim 0
ncbisearch 0
nuccore 0
nucgss 0
nucest 0
protein 0
genome 0
structure 0
taxonomy 0
snp 0
dbvar 0
epigenomics 0
gene 0
sra 0
biosystems 0
unigene 0
cdd 0
clone 0
popset 0
geoprofiles 0
gds 16
homologene 0
pccompound 0
pcsubstance 0
pcassay 0
nlmcatalog 0
probe 0
gap 0
proteinclusters 0
bioproject 0
biosample 0

See the EGQuery help page for more information.

ESpell: Obtaining spelling suggestions

ESpell retrieves spelling suggestions. In this example, we use Bio.Entrez.espell() to obtain the correct spelling of Biopython:

In [32]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.espell(term="biopythooon")
record =


In [33]:


See the ESpell help page for more information. The main use of this is for GUI tools to provide automatic suggestions for search terms.

Parsing huge Entrez XML files

The function reads the entire XML file returned by Entrez into a single Python object, which is kept in memory. To parse Entrez XML files too large to fit in memory, you can use the function Entrez.parse. This is a generator function that reads records in the XML file one by one. This function is only useful if the XML file reflects a Python list object (in other words, if on a computer with infinite memory resources would return a Python list).

For example, you can download the entire Entrez Gene database for a given organism as a file from NCBI’s ftp site. These files can be very large. As an example, on September 4, 2009, the file Homo_sapiens.ags.gz, containing the Entrez Gene database for human, had a size of 116576 kB. This file, which is in the ASN format, can be converted into an XML file using NCBI’s gene2xml program (see NCBI’s ftp site for more information):

gene2xml -b T -i Homo_sapiens.ags -o Homo_sapiens.xml

The resulting XML file has a size of 6.1 GB. Attempting on this file will result in a MemoryError on many computers.

The XML file Homo_sapiens.xml consists of a list of Entrez gene records, each corresponding to one Entrez gene in human. Entrez.parse retrieves these gene records one by one. You can then print out or store the relevant information in each record by iterating over the records. For example, this script iterates over the Entrez gene records and prints out the gene numbers and names for all current genes:

TODO: need alternate example, download option or ...

from Bio import Entrez
handle = open("Homo_sapiens.xml")
records = Entrez.parse(handle)
for record in records:
    status = record['Entrezgene_track-info']['Gene-track']['Gene-track_status']
    if status.attributes['value']=='discontinued':
    geneid = record['Entrezgene_track-info']['Gene-track']['Gene-track_geneid']
    genename = record['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']
    print(geneid, genename)

This will print:

1 A1BG
2 A2M
3 A2MP
8 AA
9 NAT1
10 NAT2
17 AAVS1

Handling errors

Three things can go wrong when parsing an XML file:

  • The file may not be an XML file to begin with;

  • The file may end prematurely or otherwise be corrupted;

  • The file may be correct XML, but contain items that are not represented in the associated DTD.

The first case occurs if, for example, you try to parse a Fasta file as if it were an XML file:

In [34]:
from Bio import Entrez
from Bio.Entrez.Parser import NotXMLError
handle = open("data/NC_005816.fna", 'rb') # a Fasta file
    record =
except NotXMLError as e:
    print('We are expecting to get NotXMLError')

We are expecting to get NotXMLError
Failed to parse the XML data (syntax error: line 1, column 0). Please make sure that the input data are in XML format.

Here, the parser didn’t find the <?xml ... tag with which an XML file is supposed to start, and therefore decides (correctly) that the file is not an XML file.

When your file is in the XML format but is corrupted (for example, by ending prematurely), the parser will raise a CorruptedXMLError. Here is an example of an XML file that ends prematurely:

<?xml version="1.0"?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD eInfoResult, 11 May 2002//EN" "">

which will generate the following traceback:

ExpatError                                Traceback (most recent call last)
/Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/Bio/Entrez/ in read(self, handle)
    214         try:
--> 215             self.parser.ParseFile(handle)
    216         except expat.ExpatError as e:

ExpatError: syntax error: line 1, column 0

During handling of the above exception, another exception occurred:

NotXMLError                               Traceback (most recent call last)
<ipython-input-63-ac0523d72453> in <module>()
----> 1

/Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/Bio/Entrez/ in read(handle, validate)
    419     from .Parser import DataHandler
    420     handler = DataHandler(validate)
--> 421     record =
    422     return record

/Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/Bio/Entrez/ in read(self, handle)
    223                 # We have not seen the initial <!xml declaration, so probably
    224                 # the input data is not in XML format.
--> 225                 raise NotXMLError(e)
    226         try:
    227             return self.object

NotXMLError: Failed to parse the XML data (syntax error: line 1, column 0). Please make sure that the input data are in XML format.

Note that the error message tells you at what point in the XML file the error was detected.

The third type of error occurs if the XML file contains tags that do not have a description in the corresponding DTD file. This is an example of such an XML file:

<?xml version="1.0"?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD eInfoResult, 11 May 2002//EN" "">
        <Description>PubMed bibliographic record</Description>
        <LastUpdate>2010/09/10 04:52</LastUpdate>

In this file, for some reason the tag <DocsumList> (and several others) are not listed in the DTD file eInfo_020511.dtd, which is specified on the second line as the DTD for this XML file. By default, the parser will stop and raise a ValidationError if it cannot find some tag in the DTD:

from Bio import Entrez
handle = open("data/einfo3.xml", 'rb')
record =
ValidationError                           Traceback (most recent call last)
<ipython-input-65-cfb96ec3d2ca> in <module>()
      1 from Bio import Entrez
      2 handle = open("data/einfo3.xml", 'rb')
----> 3 record =

/Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/Bio/Entrez/ in read(handle, validate)
    419     from .Parser import DataHandler
    420     handler = DataHandler(validate)
--> 421     record =
    422     return record

/Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/Bio/Entrez/ in read(self, handle)
    213             raise IOError("Can't parse a closed handle")
    214         try:
--> 215             self.parser.ParseFile(handle)
    216         except expat.ExpatError as e:
    217             if self.parser.StartElementHandler:

-------src-dir--------/Python-3.5.1/Modules/pyexpat.c in StartElement()

/Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/Bio/Entrez/ in startElementHandler(self, name, attrs)
    348             # Element not found in DTD
    349             if self.validating:
--> 350                 raise ValidationError(name)
    351             else:
    352                 # this will not be stored in the record

ValidationError: Failed to find tag 'DocsumList' in the DTD. To skip all tags that are not represented in the DTD, please call or Bio.Entrez.parse with validate=False.

Optionally, you can instruct the parser to skip such tags instead of raising a ValidationError. This is done by calling or Entrez.parse with the argument validate equal to False:

In [35]:
from Bio import Entrez
handle = open("data/einfo3.xml", 'rb')
record =, validate=False)

Of course, the information contained in the XML tags that are not in the DTD are not present in the record returned by

Specialized parsers

The function can parse most (if not all) XML output returned by Entrez. Entrez typically allows you to retrieve records in other formats, which may have some advantages compared to the XML format in terms of readability (or download size).

To request a specific file format from Entrez using Bio.Entrez.efetch() requires specifying the rettype and/or retmode optional arguments. The different combinations are described for each database type on the NCBI efetch webpage.

One obvious case is you may prefer to download sequences in the FASTA or GenBank/GenPept plain text formats (which can then be parsed with Bio.SeqIO, see Sections [sec:SeqIO_GenBank_Online] and EFetch: Downloading full records from Entrez). For the literature databases, Biopython contains a parser for the MEDLINE format used in PubMed.

Parsing Medline records {#subsec:entrez-and-medline}

You can find the Medline parser in Bio.Medline. Suppose we want to parse the file pubmed_result1.txt, containing one Medline record. You can find this file in Biopython’s Tests\Medline directory. The file looks like this:

PMID- 12230038
DA  - 20020916
DCOM- 20030606
LR  - 20041117
PUBM- Print
IS  - 1467-5463 (Print)
VI  - 3
IP  - 3
DP  - 2002 Sep
TI  - The Bio* toolkits--a brief overview.
PG  - 296-302
AB  - Bioinformatics research is often difficult to do with commercial software. The
      Open Source BioPerl, BioPython and Biojava projects provide toolkits with

We first open the file and then parse it:

In [36]:
from Bio import Medline
with open("data/pubmed_result1.txt") as handle:
    record =

The record now contains the Medline record as a Python dictionary:

In [37]:


In [38]:

'Bioinformatics research is often difficult to do with commercial software. The Open Source BioPerl, BioPython and Biojava projects provide toolkits with multiple functionality that make it easier to create customised pipelines or analysis. This review briefly compares the quirks of the underlying languages and the functionality, documentation, utility and relative advantages of the Bio counterparts, particularly from the point of view of the beginning biologist programmer.'

The key names used in a Medline record can be rather obscure; use

In [39]:

Help on Record in module Bio.Medline object:

class Record(builtins.dict)
 |  A dictionary holding information from a Medline record.
 |  All data are stored under the mnemonic appearing in the Medline
 |  file. These mnemonics have the following interpretations:
 |  ========= ==============================
 |  Mnemonic  Description
 |  --------- ------------------------------
 |  AB        Abstract
 |  CI        Copyright Information
 |  AD        Affiliation
 |  IRAD      Investigator Affiliation
 |  AID       Article Identifier
 |  AU        Author
 |  FAU       Full Author
 |  CN        Corporate Author
 |  DCOM      Date Completed
 |  DA        Date Created
 |  LR        Date Last Revised
 |  DEP       Date of Electronic Publication
 |  DP        Date of Publication
 |  EDAT      Entrez Date
 |  GS        Gene Symbol
 |  GN        General Note
 |  GR        Grant Number
 |  IR        Investigator Name
 |  FIR       Full Investigator Name
 |  IS        ISSN
 |  IP        Issue
 |  TA        Journal Title Abbreviation
 |  JT        Journal Title
 |  LA        Language
 |  LID       Location Identifier
 |  MID       Manuscript Identifier
 |  MHDA      MeSH Date
 |  MH        MeSH Terms
 |  JID       NLM Unique ID
 |  RF        Number of References
 |  OAB       Other Abstract
 |  OCI       Other Copyright Information
 |  OID       Other ID
 |  OT        Other Term
 |  OTO       Other Term Owner
 |  OWN       Owner
 |  PG        Pagination
 |  PS        Personal Name as Subject
 |  FPS       Full Personal Name as Subject
 |  PL        Place of Publication
 |  PHST      Publication History Status
 |  PST       Publication Status
 |  PT        Publication Type
 |  PUBM      Publishing Model
 |  PMC       PubMed Central Identifier
 |  PMID      PubMed Unique Identifier
 |  RN        Registry Number/EC Number
 |  NM        Substance Name
 |  SI        Secondary Source ID
 |  SO        Source
 |  SFM       Space Flight Mission
 |  STAT      Status
 |  SB        Subset
 |  TI        Title
 |  TT        Transliterated Title
 |  VI        Volume
 |  CON       Comment on
 |  CIN       Comment in
 |  EIN       Erratum in
 |  EFR       Erratum for
 |  CRI       Corrected and Republished in
 |  CRF       Corrected and Republished from
 |  PRIN      Partial retraction in
 |  PROF      Partial retraction of
 |  RPI       Republished in
 |  RPF       Republished from
 |  RIN       Retraction in
 |  ROF       Retraction of
 |  UIN       Update in
 |  UOF       Update of
 |  SPIN      Summary for patients in
 |  ORI       Original report in
 |  ========= ==============================
 |  Method resolution order:
 |      Record
 |      builtins.dict
 |      builtins.object
 |  Data descriptors defined here:
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  ----------------------------------------------------------------------
 |  Methods inherited from builtins.dict:
 |  __contains__(self, key, /)
 |      True if D has a key k, else False.
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  __eq__(self, value, /)
 |      Return self==value.
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  __gt__(self, value, /)
 |      Return self>value.
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  __iter__(self, /)
 |      Implement iter(self).
 |  __le__(self, value, /)
 |      Return self<=value.
 |  __len__(self, /)
 |      Return len(self).
 |  __lt__(self, value, /)
 |      Return self<value.
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  __repr__(self, /)
 |      Return repr(self).
 |  __setitem__(self, key, value, /)
 |      Set self[key] to value.
 |  __sizeof__(...)
 |      D.__sizeof__() -> size of D in memory, in bytes
 |  clear(...)
 |      D.clear() -> None.  Remove all items from D.
 |  copy(...)
 |      D.copy() -> a shallow copy of D
 |  fromkeys(iterable, value=None, /) from builtins.type
 |      Returns a new dict with keys from iterable and values equal to value.
 |  get(...)
 |      D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.
 |  items(...)
 |      D.items() -> a set-like object providing a view on D's items
 |  keys(...)
 |      D.keys() -> a set-like object providing a view on D's keys
 |  pop(...)
 |      D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
 |      If key is not found, d is returned if given, otherwise KeyError is raised
 |  popitem(...)
 |      D.popitem() -> (k, v), remove and return some (key, value) pair as a
 |      2-tuple; but raise KeyError if D is empty.
 |  setdefault(...)
 |      D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
 |  update(...)
 |      D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
 |      If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]
 |      If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
 |      In either case, this is followed by: for k in F:  D[k] = F[k]
 |  values(...)
 |      D.values() -> an object providing a view on D's values
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from builtins.dict:
 |  __hash__ = None

for a brief summary.

To parse a file containing multiple Medline records, you can use the parse function instead:

In [40]:
from Bio import Medline
with open("data/pubmed_result2.txt") as handle:
    for record in Medline.parse(handle):

A high level interface to SCOP and ASTRAL implemented in python.
GenomeDiagram: a python package for the visualization of large-scale genomic data.
Open source clustering software.
PDB file parser and structure class implemented in Python.

Instead of parsing Medline records stored in files, you can also parse Medline records downloaded by Bio.Entrez.efetch. For example, let’s look at all Medline records in PubMed related to Biopython:

In [41]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.esearch(db="pubmed", term="biopython")
record =

['24929426', '24497503', '24267035', '24194598', '23842806', '23157543', '22909249', '22399473', '21666252', '21210977', '20015970', '19811691', '19773334', '19304878', '18606172', '21585724', '16403221', '16377612', '14871861', '14630660']

We now use Bio.Entrez.efetch to download these Medline records:

In [42]:
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="text")

Here, we specify rettype="medline", retmode="text" to obtain the Medline records in plain-text Medline format. Now we use Bio.Medline to parse these records:

In [43]:
from Bio import Medline
records = Medline.parse(handle)
for record in records:

['Waldmann J', 'Gerken J', 'Hankeln W', 'Schweer T', 'Glockner FO']
['Mielke CJ', 'Mandarino LJ', 'Dinu V']
['Gajda MJ']
['Mathelier A', 'Zhao X', 'Zhang AW', 'Parcy F', 'Worsley-Hunt R', 'Arenillas DJ', 'Buchman S', 'Chen CY', 'Chou A', 'Ienasescu H', 'Lim J', 'Shyr C', 'Tan G', 'Zhou M', 'Lenhard B', 'Sandelin A', 'Wasserman WW']
['Morales HF', 'Giovambattista G']
['Baldwin S', 'Revanna R', 'Thomson S', 'Pither-Joyce M', 'Wright K', 'Crowhurst R', 'Fiers M', 'Chen L', 'Macknight R', 'McCallum JA']
['Talevich E', 'Invergo BM', 'Cock PJ', 'Chapman BA']
['Prins P', 'Goto N', 'Yates A', 'Gautier L', 'Willis S', 'Fields C', 'Katayama T']
['Schmitt T', 'Messina DN', 'Schreiber F', 'Sonnhammer EL']
['Antao T']
['Cock PJ', 'Fields CJ', 'Goto N', 'Heuer ML', 'Rice PM']
['Jankun-Kelly TJ', 'Lindeman AD', 'Bridges SM']
['Korhonen J', 'Martinmaki P', 'Pizzi C', 'Rastas P', 'Ukkonen E']
['Cock PJ', 'Antao T', 'Chang JT', 'Chapman BA', 'Cox CJ', 'Dalke A', 'Friedberg I', 'Hamelryck T', 'Kauff F', 'Wilczynski B', 'de Hoon MJ']
['Munteanu CR', 'Gonzalez-Diaz H', 'Magalhaes AL']
['Faircloth BC']
['Casbon JA', 'Crooks GE', 'Saqi MA']
['Pritchard L', 'White JA', 'Birch PR', 'Toth IK']
['de Hoon MJ', 'Imoto S', 'Nolan J', 'Miyano S']
['Hamelryck T', 'Manderick B']

For comparison, here we show an example using the XML format:

In [44]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.esearch(db="pubmed", term="biopython")
record =
idlist = record["IdList"]
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline", retmode="xml")
records =
for record in records:

FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences.
AMASS: a database for investigating protein structures.
HPDB-Haskell library for processing atomic biomolecular structures in Protein Data Bank format.
JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles.
BioSmalltalk: a pure object system and library for bioinformatics.
A toolkit for bulk PCR-based marker design from next-generation sequence data: application for development of a framework linkage map in bulb onion (Allium cepa L.).
Bio.Phylo: a unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython.
Sharing programming resources between Bio* projects through remote procedure call and native call stack strategies.
Letter to the editor: SeqXML and OrthoXML: standards for sequence and orthology information.
interPopula: a Python API to access the HapMap Project dataset.
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Exploratory visual analysis of conserved domains on multiple sequence alignments.
MOODS: fast search for position weight matrix matches in DNA sequences.
Biopython: freely available Python tools for computational molecular biology and bioinformatics.
Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices.
msatcommander: detection of microsatellite repeat arrays and automated, locus-specific primer design.
A high level interface to SCOP and ASTRAL implemented in python.
GenomeDiagram: a python package for the visualization of large-scale genomic data.
Open source clustering software.
PDB file parser and structure class implemented in Python.

Note that in both of these examples, for simplicity we have naively combined ESearch and EFetch. In this situation, the NCBI would expect you to use their history feature, as illustrated in Section History and WebEnv.

Parsing GEO records

GEO (Gene Expression Omnibus) is a data repository of high-throughput gene expression and hybridization array data. The Bio.Geo module can be used to parse GEO-formatted data.

The following code fragment shows how to parse the example GEO file GSE16.txt into a record and print the record:

In [45]:
from Bio import Geo
handle = open("data/GSE16.txt")
records = Geo.parse(handle)
for record in records:

GEO Id: GSM804
Sample_author: Antoine,M,Snijders

Sample_author: Norma,,Nowak

Sample_author: Richard,,Segraves

Sample_author: Stephanie,,Blackwood

Sample_author: Nils,,Brown

Sample_author: Jeffery,,Conroy

Sample_author: Greg,,Hamilton

Sample_author: Anna,K,Hindle

Sample_author: Bing,,Huey

Sample_author: Karen,,Kimura

Sample_author: Sindy,,Law

Sample_author: Ken,,Myambo

Sample_author: Joel,,Palmer

Sample_author: Bauke,,Ylstra

Sample_author: Jingzhu,P,Yue

Sample_author: Joe,W,Gray

Sample_author: Ajay,N,Jain

Sample_author: Daniel,,Pinkel

Sample_author: Donna,G,Albertson

Sample_description: Coriell Cell Repositories cell line <a h

Sample_description: Fibroblast cell line derived from a 1 mo
nth old female with multiple congenital malformations, dysmorphic features, intr
auterine growth retardation, heart murmur, cleft palate, equinovarus deformity, 
microcephaly, coloboma of right iris, clinodactyly, reduced RBC catalase activit
y, and 1 copy of catalase gene.

Sample_description: Chromosome abnormalities are present.

Sample_description: Karyotype is 46,XX,-11,+der(11)inv ins(1
1;10)(11pter> 11p13::10q21>10q24::11p13>11qter)mat

Sample_organism: Homo sapiens

Sample_platform_id: GPL28

Sample_pubmed_id: 11687795

Sample_series_id: GSE16

Sample_status: Public on Feb 12 2002

Sample_submission_date: Jan 17 2002

Sample_submitter_city: San Francisco,CA,94143,USA

Sample_submitter_department: Comprehensive Cancer Center


Sample_submitter_institute: University of California San Francisco

Sample_submitter_name: Donna,G,Albertson

Sample_submitter_phone: 415 502-8463

Sample_target_source1: Cell line GM05296

Sample_target_source2: normal male reference genomic DNA

Sample_title: CGH_Albertson_GM05296-001218

Sample_type: dual channel genomic

Column Header Definitions
    ID_REF: Unique row identifier, genome position o

    LINEAR_RATIO: Mean of replicate Cy3/Cy5 ratios

    LOG2STDDEV: Standard deviation of VALUE

    NO_REPLICATES: Number of replicate spot measurements

    VALUE: aka LOG2RATIO, mean of log base 2 of LIN

1: 1		1.047765	0.011853	3	
2: 2				0	
3: 3	0.008824	1.006135	0.00143	3	
4: 4	-0.000894	0.99938	0.001454	3	
5: 5	0.075875	1.054	0.003077	3	
6: 6	0.017303	1.012066	0.005876	2	
7: 7	-0.006766	0.995321	0.013881	3	
8: 8	0.020755	1.014491	0.005506	3	
9: 9	-0.094938	0.936313	0.012662	3	
10: 10	-0.054527	0.96291	0.01073	3	
11: 11	-0.025057	0.982782	0.003855	3	
12: 12				0	
13: 13	0.108454	1.078072	0.005196	3	
14: 14	0.078633	1.056017	0.009165	3	
15: 15	0.098571	1.070712	0.007834	3	
16: 16	0.044048	1.031003	0.013651	3	
17: 17	0.018039	1.012582	0.005471	3	
18: 18	-0.088807	0.9403	0.010571	3	
19: 19	0.016349	1.011397	0.007113	3	
20: 20	0.030977	1.021704	0.016798	3	

You can search the “gds” database (GEO datasets) with ESearch:

In [46]:
from Bio import Entrez = "" # Always tell NCBI who you are
handle = Entrez.esearch(db="gds", term="GSE16")
record =


In [47]:

['200000016', '100000028', '300000818', '300000817', '300000816', '300000815', '300000814', '300000813', '300000812', '300000811', '300000810', '300000809', '300000808', '300000807', '300000806', '300000805', '300000804', '300000803', '300000802', '300000801']

From the Entrez website, UID “200000016” is GDS16 while the other hit “100000028” is for the associated platform, GPL28. Unfortunately, at the time of writing the NCBI don’t seem to support downloading GEO files using Entrez (not as XML, nor in the Simple Omnibus Format in Text (SOFT) format).

However, it is actually pretty straight forward to download the GEO files by FTP or HTTP from instead. In this case you might want (a compressed file, see the Python module gzip).

Parsing UniGene records

UniGene is an NCBI database of the transcriptome, with each UniGene record showing the set of transcripts that are associated with a particular gene in a specific organism. A typical UniGene record looks like this:

ID          Hs.2
TITLE       N-acetyltransferase 2 (arylamine N-acetyltransferase)
GENE        NAT2
CYTOBAND    8p22
GENE_ID     10
EXPRESS      bone| connective tissue| intestine| liver| liver tumor| normal| soft tissue/muscle tissue tumor| adult
RESTR_EXPR   adult
STS         ACC=PMC310725P3 UNISTS=272646
STS         ACC=WIAF-2120 UNISTS=44576
STS         ACC=G59899 UNISTS=137181
STS         ACC=GDB:187676 UNISTS=155563
PROTSIM     ORG=10090; PROTGI=6754794; PROTID=NP_035004.1; PCT=76.55; ALN=288
PROTSIM     ORG=9796; PROTGI=149742490; PROTID=XP_001487907.1; PCT=79.66; ALN=288
PROTSIM     ORG=9986; PROTGI=126722851; PROTID=NP_001075655.1; PCT=76.90; ALN=288
PROTSIM     ORG=9598; PROTGI=114619004; PROTID=XP_519631.2; PCT=98.28; ALN=288

SCOUNT      38
SEQUENCE    ACC=BC067218.1; NID=g45501306; PID=g45501307; SEQTYPE=mRNA
SEQUENCE    ACC=NM_000015.2; NID=g116295259; PID=g116295260; SEQTYPE=mRNA
SEQUENCE    ACC=D90042.1; NID=g219415; PID=g219416; SEQTYPE=mRNA
SEQUENCE    ACC=D90040.1; NID=g219411; PID=g219412; SEQTYPE=mRNA
SEQUENCE    ACC=BC015878.1; NID=g16198419; PID=g16198420; SEQTYPE=mRNA
SEQUENCE    ACC=CR407631.1; NID=g47115198; PID=g47115199; SEQTYPE=mRNA
SEQUENCE    ACC=BG569293.1; NID=g13576946; CLONE=IMAGE:4722596; END=5'; LID=6989; SEQTYPE=EST; TRACE=44157214
SEQUENCE    ACC=AU099534.1; NID=g13550663; CLONE=HSI08034; END=5'; LID=8800; SEQTYPE=EST

This particular record shows the set of transcripts (shown in the SEQUENCE lines) that originate from the human gene NAT2, encoding en N-acetyltransferase. The PROTSIM lines show proteins with significant similarity to NAT2, whereas the STS lines show the corresponding sequence-tagged sites in the genome.

To parse UniGene files, use the Bio.UniGene module:

TODO: Need a working example

In [48]:
# from Bio import UniGene
# input = open("data/")
# record =

The record returned by is a Python object with attributes corresponding to the fields in the UniGene record. For example,

In [49]:
# record.ID

In [50]:
# record.title

The EXPRESS and RESTR_EXPR lines are stored as Python lists of strings:

['bone', 'connective tissue', 'intestine', 'liver', 'liver tumor', 'normal', 'soft tissue/muscle tissue tumor', 'adult']

Specialized objects are returned for the STS, PROTSIM, and SEQUENCE lines, storing the keys shown in each line as attributes:

In [51]:
# record.sts[0].acc

In [52]:
# record.sts[0].unists

and similarly for the PROTSIM and SEQUENCE lines.

To parse a file containing more than one UniGene record, use the parse function in Bio.UniGene:

TODO: Need a working example

In [53]:
# from Bio import UniGene
# input = open("")
# records = UniGene.parse(input)
# for record in records:
#     print(record.ID)

Using a proxy

Normally you won’t have to worry about using a proxy, but if this is an issue on your network here is how to deal with it. Internally, Bio.Entrez uses the standard Python library urllib for accessing the NCBI servers. This will check an environment variable called http_proxy to configure any simple proxy automatically. Unfortunately this module does not support the use of proxies which require authentication.

You may choose to set the http_proxy environment variable once (how you do this will depend on your operating system). Alternatively you can set this within Python at the start of your script, for example:

import os
os.environ["http_proxy"] = ""

See the urllib documentation for more details.


PubMed and Medline {#subsec:pub_med}

If you are in the medical field or interested in human issues (and many times even if you are not!), PubMed ( is an excellent source of all kinds of goodies. So like other things, we’d like to be able to grab information from it and use it in Python scripts.

In this example, we will query PubMed for all articles having to do with orchids (see section [sec:orchids] for our motivation). We first check how many of such articles there are:

In [54]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.egquery(term="orchid")
record =
for row in record["eGQueryResult"]:
    if row["DbName"]=="pubmed":


Now we use the Bio.Entrez.efetch function to download the PubMed IDs of these 463 articles:

In [55]:
handle = Entrez.esearch(db="pubmed", term="orchid", retmax=463)
record =
idlist = record["IdList"]
print("The first 10 Id's containing all of the PubMed IDs of articles related to orchids:\n {}".format(idlist[:10]))

The first 10 Id's containing all of the PubMed IDs of articles related to orchids:
 ['26752741', '26743923', '26738548', '26732875', '26732614', '26724929', '26715121', '26713612', '26708054', '26694378']

Now that we’ve got them, we obviously want to get the corresponding Medline records and extract the information from them. Here, we’ll download the Medline records in the Medline flat-file format, and use the Bio.Medline module to parse them:

In [56]:
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=idlist, rettype="medline")

In [57]:
records = Medline.parse(handle)

NOTE - We’ve just done a separate search and fetch here, the NCBI much prefer you to take advantage of their history support in this situation. See Section History and WebEnv.

Keep in mind that records is an iterator, so you can iterate through the records only once. If you want to save the records, you can convert them to a list:

In [58]:
records = list(records)

Let’s now iterate over the records to print out some information about each record:

In [59]:
for record in records:
    print("title:", record.get("TI", "?"))
    print("authors:", record.get("AU", "?"))
    print("source:", record.get("SO", "?"))

title: Promise and Challenge of DNA Barcoding in Venus Slipper (Paphiopedilum).
authors: ['Guo YY', 'Huang LQ', 'Liu ZJ', 'Wang XQ']
source: PLoS One. 2016 Jan 11;11(1):e0146880. doi: 10.1371/journal.pone.0146880. eCollection 2016.

title: In vitro profiling of anti-MRSA activity of thymoquinone against selected type and clinical strains.
authors: ['Hariharan P', 'Paul-Satyaseela M', 'Gnanamani A']
source: Lett Appl Microbiol. 2016 Jan 7. doi: 10.1111/lam.12544.

title: Low glutathione redox state couples with a decreased ascorbate redox ratio to accelerate flowering in Oncidium orchid.
authors: ['Chin DC', 'Hsieh CC', 'Lin HY', 'Yeh KW']
source: Plant Cell Physiol. 2016 Jan 6. pii: pcv206.

title: Proteomic and morphometric study of the in vitro interaction between Oncidium sphacelatum Lindl. (Orchidaceae) and Thanatephorus sp. RG26 (Ceratobasidiaceae).
authors: ['Lopez-Chavez MY', 'Guillen-Navarro K', 'Bertolini V', 'Encarnacion S', 'Hernandez-Ortiz M', 'Sanchez-Moreno I', 'Damon A']
source: Mycorrhiza. 2016 Jan 6.

title: A transcriptome-wide, organ-specific regulatory map of Dendrobium officinale, an important traditional Chinese orchid herb.
authors: ['Meng Y', 'Yu D', 'Xue J', 'Lu J', 'Feng S', 'Shen C', 'Wang H']
source: Sci Rep. 2016 Jan 6;6:18864. doi: 10.1038/srep18864.

title: Methods for genetic transformation in Dendrobium.
authors: ['Teixeira da Silva JA', 'Dobranszki J', 'Cardoso JC', 'Chandler SF', 'Zeng S']
source: Plant Cell Rep. 2016 Jan 2.

title: Sebacina vermifera: a unique root symbiont with vast agronomic potential.
authors: ['Ray P', 'Craven KD']
source: World J Microbiol Biotechnol. 2016 Jan;32(1):16. doi: 10.1007/s11274-015-1970-7. Epub 2015 Dec 29.

title: Cuticular Hydrocarbons of Orchid Bees Males: Interspecific and Chemotaxonomy Variation.
authors: ['Dos Santos AB', 'do Nascimento FS']
source: PLoS One. 2015 Dec 29;10(12):e0145070. doi: 10.1371/journal.pone.0145070. eCollection 2015.

title: Sex and the Catasetinae (Darwin's favourite orchids).
authors: ['Perez-Escobar OA', 'Gottschling M', 'Whitten WM', 'Salazar G', 'Gerlach G']
source: Mol Phylogenet Evol. 2015 Dec 17. pii: S1055-7903(15)00372-3. doi: 10.1016/j.ympev.2015.11.019.

title: Comparative Transcriptome Analysis of Genes Involved in GA-GID1-DELLA Regulatory Module in Symbiotic and Asymbiotic Seed Germination of Anoectochilus roxburghii (Wall.) Lindl. (Orchidaceae).
authors: ['Liu SS', 'Chen J', 'Li SC', 'Zeng X', 'Meng ZX', 'Guo SX']
source: Int J Mol Sci. 2015 Dec 18;16(12):30190-203. doi: 10.3390/ijms161226224.

title: Dual Drug Loaded Nanoliposomal Chemotherapy: A Promising Strategy for Treatment of Head and Neck Squamous Cell Carcinoma.
authors: ['Mohan A', 'Narayanan S', 'Balasubramanian G', 'Sethuraman S', 'Krishnan UM']
source: Eur J Pharm Biopharm. 2015 Dec 9. pii: S0939-6411(15)00489-0. doi: 10.1016/j.ejpb.2015.11.017.

The output for this looks like:

title: Sex pheromone mimicry in the early spider orchid (ophrys sphegodes):
patterns of hydrocarbons as the key mechanism for pollination by sexual
deception [In Process Citation]
authors: ['Schiestl FP', 'Ayasse M', 'Paulus HF', 'Lofstedt C', 'Hansson BS',
'Ibarra F', 'Francke W']
source: J Comp Physiol [A] 2000 Jun;186(6):567-74

Especially interesting to note is the list of authors, which is returned as a standard Python list. This makes it easy to manipulate and search using standard Python tools. For instance, we could loop through a whole bunch of entries searching for a particular author with code like the following:

In [60]:
search_author = "Waits T"

In [61]:
for record in records:
    if not "AU" in record:
    if search_author in record["AU"]:
        print("Author %s found: %s" % (search_author, record["SO"]))

Hopefully this section gave you an idea of the power and flexibility of the Entrez and Medline interfaces and how they can be used together.

Searching, downloading, and parsing Entrez Nucleotide records {#subsec:entrez_example_genbank}

Here we’ll show a simple example of performing a remote Entrez query. In section [sec:orchids] of the parsing examples, we talked about using NCBI’s Entrez website to search the NCBI nucleotide databases for info on Cypripedioideae, our friends the lady slipper orchids. Now, we’ll look at how to automate that process using a Python script. In this example, we’ll just show how to connect, get the results, and parse them, with the Entrez module doing all of the work.

First, we use EGQuery to find out the number of results we will get before actually downloading them. EGQuery will tell us how many search results were found in each of the databases, but for this example we are only interested in nucleotides:

In [62]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.egquery(term="Cypripedioideae")
record =
for row in record["eGQueryResult"]:
    if row["DbName"]=="nuccore":


So, we expect to find 814 Entrez Nucleotide records (this is the number I obtained in 2008; it is likely to increase in the future). If you find some ridiculously high number of hits, you may want to reconsider if you really want to download all of them, which is our next step:

In [63]:
from Bio import Entrez
handle = Entrez.esearch(db="nucleotide", term="Cypripedioideae", retmax=814)
record =

Here, record is a Python dictionary containing the search results and some auxiliary information. Just for information, let’s look at what is stored in this dictionary:

In [64]:

dict_keys(['TranslationStack', 'IdList', 'TranslationSet', 'QueryTranslation', 'Count', 'RetStart', 'RetMax'])

First, let’s check how many results were found:

In [65]:


which is the number we expected. The 814 results are stored in record['IdList']:

In [66]:


Let’s look at the first five results:

In [67]:

['874509867', '874509089', '844174433', '937957673', '694174838']

[sec:entrez-batched-efetch] We can download these records using efetch. While you could download these records one by one, to reduce the load on NCBI’s servers, it is better to fetch a bunch of records at the same time, shown below. However, in this situation you should ideally be using the history feature described later in Section History and WebEnv.

In [68]:
idlist = ",".join(record["IdList"][:5])


In [69]:
handle = Entrez.efetch(db="nucleotide", id=idlist, retmode="xml")
records =


Each of these records corresponds to one GenBank record.

In [70]:

dict_keys(['GBSeq_division', 'GBSeq_moltype', 'GBSeq_definition', 'GBSeq_topology', 'GBSeq_locus', 'GBSeq_strandedness', 'GBSeq_source', 'GBSeq_taxonomy', 'GBSeq_create-date', 'GBSeq_accession-version', 'GBSeq_references', 'GBSeq_sequence', 'GBSeq_other-seqids', 'GBSeq_primary-accession', 'GBSeq_length', 'GBSeq_update-date', 'GBSeq_feature-table', 'GBSeq_organism'])

In [71]:


In [72]:

['gnl|uoguelph|SCBI449-14.rbcLa', 'gb|KP644081.1|', 'gi|874509867']

In [73]:

Cypripedium calceolus voucher SNP_13_0359 ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, partial cds; chloroplast

In [74]:

Cypripedium calceolus

You could use this to quickly set up searches – but for heavy usage, see Section History and WebEnv.

Searching, downloading, and parsing GenBank records {#sec:entrez-search-fetch-genbank}

The GenBank record format is a very popular method of holding information about sequences, sequence features, and other associated sequence information. The format is a good way to get information from the NCBI databases at

In this example we’ll show how to query the NCBI databases,to retrieve the records from the query, and then parse them using Bio.SeqIO - something touched on in Section [sec:SeqIO_GenBank_Online]. For simplicity, this example does not take advantage of the WebEnv history feature – see Section History and WebEnv for this.

First, we want to make a query and find out the ids of the records to retrieve. Here we’ll do a quick search for one of our favorite organisms, Opuntia (prickly-pear cacti). We can do quick search and get back the GIs (GenBank identifiers) for all of the corresponding records. First we check how many records there are:

In [75]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.egquery(term="Opuntia AND rpl16")
record =
for row in record["eGQueryResult"]:
    if row["DbName"]=="nuccore":


Now we download the list of GenBank identifiers:

In [76]:
handle = Entrez.esearch(db="nuccore", term="Opuntia AND rpl16")
record =
gi_list = record["IdList"]

['377581039', '330887241', '330887240', '330887239', '330887238', '330887237', '330887236', '330887235', '330887233', '330887232', '330887231', '330887228', '330887227', '330887226', '330887225', '330887224', '330887223', '57240072', '57240071', '6273287']

Now we use these GIs to download the GenBank records - note that with older versions of Biopython you had to supply a comma separated list of GI numbers to Entrez, as of Biopython 1.59 you can pass a list and this is converted for you:

In [77]:
gi_str = ",".join(gi_list)
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="gb", retmode="text")

If you want to look at the raw GenBank files, you can read from this handle and print out the result:

In [78]:
text =

LOCUS       HQ621368                 399 bp    DNA     linear   PLN 26-FEB-2012
DEFINITION  Opuntia decumbens voucher Martinez & Eggli 146a (ZSS) ribosomal
            protein L16 (rpl16) gene, partial cds; chloroplast.
VERSION     HQ621368.1  GI:377581039
SOURCE      chloroplast Opuntia decumbens
  ORGANISM  Opuntia decumbens
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 399)
  AUTHORS   Arakaki,M., Christin,P.A., Nyffeler,R., Lendel,A., Eggli,U.,
            Ogburn,R.M., Spriggs,E., Moore,M.J. and Edwards,E.J.
  TITLE     Contemporaneous and recent radiations of the world's major
            succulent plant lineages
  JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 108 (20), 8379-8384 (2011)
   PUBMED   21536881
REFERENCE   2  (bases 1 to 399)
  AUTHORS   Arakaki,M., Christin,P.-A., Nyffeler,R., Eggli,U., Ogburn,R.M.,
            Spriggs,E., Moore,M.J. and Edwards,E.J.
  TITLE     Direct Submission
  JOURNAL   Submitted (15-NOV-2010) Department of Ecology and Evolutionary
            Biology, Brown University, 80 Waterman St., Providence, RI 02912,
COMMENT     ##Assembly-Data-START##
            Assembly Method       :: MIRA V3rc4; Geneious v. 4.8
            Sequencing Technology :: 454
FEATURES             Location/Qualifiers
     source          1..399
                     /organism="Opuntia decumbens"
                     /mol_type="genomic DNA"
                     /specimen_voucher="Martinez & Eggli 146a (ZSS)"
                     /note="authority: Opuntia decumbens Salm-Dyck"
     gene            <1..>399
     CDS             <1..>399
                     /product="ribosomal protein L16"
        1 aaccccaaaa gaaccagatt ctgtaaacaa catagaggaa gaatgaaggg aatatcttat
       61 cgggggaatc gtatttgttt cggaagatat gctcttcagg cacttgagcc tgcttggatc
      121 acgtctagac aaatagaagc aggtcggcga gcaatgacgc gaaatgcacg ccgcggtgga
      181 aaaatatggg tacgtatatt tccagacaaa ccagttacag taaaatctgc ggaaagccgt
      241 atgggttcgg ggaaaggatc ccacctatat tgggtagttg ttgtcaaacc cggtcgaata
      301 ctttatgaaa taagcggagt atcagaaaat atagcccgaa gggctatctc gatagcggca
      361 tctaaaatgc ctgtacgaac tcaattcatt atttcagga

LOCUS       HM041482                1197 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Cylindropuntia tunicata ribosomal protein L16-like (rpl16) gene,
            partial sequence; chloroplast.
VERSION     HM041482.1  GI:330887241
SOURCE      chloroplast Cylindropuntia tunicata
  ORGANISM  Cylindropuntia tunicata
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1197)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1197)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1197
                     /organism="Cylindropuntia tunicata"
                     /mol_type="genomic DNA"
     gene            <1..>1197
     misc_feature    <1094..>1197
                     /note="similar to ribosomal protein L16"
        1 gtgatatacg aaacagtaag agcccatagt atgaagtatg aactaataac tatagaacta
       61 ataaccaact catcgcatca cattatctgg atccaaagaa gcagtcaaga taggatattt
      121 tggtcctatc attgcagcaa ctgaattttt tttttcataa acaagaaatc gaatgagttg
      181 tcaagcaaaa gaaaaaaaaa aaaagaaaaa tatacnttaa aggaggggga tgcggataaa
      241 tggaaaggcg aaagaaagaa aaaaatgaat ctaaatgata tacgattcca ctatgtaagg
      301 tctttgaatc atatcataaa agacaatgta ataaagcatg aatacagatt cacacataat
      361 tatctgatat gaatctattc atagaaaaaa gaaaaaagta agagcctccg gccaataaag
      421 actaagaggg gttggctcaa aaacaaagtt cattaagagc tcccattgta gaattcagac
      481 ctaatcatta atcaagaagc gatgggaacg atgtaatcca tgaatacaga agattcaatt
      541 gaaaaaagaa tcctaatgat tcattgggga ggatggcgga acgaaccaga gaccaattca
      601 tctattctga aaagtgataa actaatccta taaaactaaa atagatattg aaagagtaaa
      661 tattcgcccg cgaaaattcc ttttttatta aattgctcat attttatttt agcaatgcaa
      721 tctaataaaa tatatctata caaaaaaaca tagacaaact atatatataa tatttcaaat
      781 tcccttatat atccaaatat aaaaatatct aataaattag atgaatatca aagaatctat
      841 tgatttagtg tattattaaa tgtatatctt aattcaatat tattattcta ttcattttta
      901 ttcattttca aatttataat atattaatct atatattaat ttagaattct attctaattc
      961 gaattcaatt tttaaatatt cattcatatt caattaaaat tgaaattttt tcattcgcga
     1021 ggagccggat gagaagaaac tctcatgtcc ggttctgtag tagagatgga attaagaaaa
     1081 aaccatcaac tataacccca aaagaaccag attctgtaaa caacatagag gaagaatgaa
     1141 gggaatatct tatcggggga atcgtatttg tttcggaaaa tatgctctca ggcacga

LOCUS       HM041481                1200 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Opuntia palmadora ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041481.1  GI:330887240
SOURCE      chloroplast Opuntia palmadora
  ORGANISM  Opuntia palmadora
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1200)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1200)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1200
                     /organism="Opuntia palmadora"
                     /mol_type="genomic DNA"
     gene            <1..>1200
     misc_feature    <1098..>1200
                     /note="similar to ribosomal protein L16"
        1 tgatatacga aaagtaagag cccatagtat gaagtatgaa ctaataacta tagaactaat
       61 aaccaactca tcgcatcaca ttatctggat ccaaagaagc agtcaagata ggatattttg
      121 gtcctatcat tgcagcaact gaattttttt ttcataaaca agaaatcaaa tgagttgtca
      181 agcaaaagaa aaaaaaaaga aaaatatacn ttaaaggagg gggatgcgga taaatggaaa
      241 ggcgaaagaa agaaaaaaat gaatctaaat gatatacgat tccactatgt aaggtctttg
      301 aatcatatca taaaagacaa tgtaataaag catgaataca gattcacaca taattatctg
      361 atatgaatct attcatagaa aaaagaaaaa agtaagagcc tccgggccaa taaagactaa
      421 gagggttggg ctcaagaaca aagttcatta agagctccat tgtagaattc agacctaatc
      481 attaatcaag aagcgatggg aacgatgtaa tccatgaata cagaagattc aattgaaaaa
      541 gaatcctaat gattcattgg gaaggatggc ggaacgaacc agagaccaat tcatctattc
      601 tgaaaagtga taaactaatc ctataaaact aaaatagata ttgaaagagt aaatattcgc
      661 ccgcgaaaat tcctttttta ttaaattgct cacattttat tttagcaatg caatctaata
      721 aaatatatct atacaaaaaa atatagacaa actatatata taatatattt caaatttcct
      781 tatatatcct aatataaaaa tatctaataa attagatgaa tatcaaagaa tctattgatt
      841 tagtgtatta ttaaatgtat atcttaattc aatattatta ttctattcat ttttattatt
      901 catttttatt cattttcaaa tttagaatat attaatctat atattaattt agaattctat
      961 tctaattcga attcaatttt taaatattca tattcaatta aaattgaaat tttttcattc
     1021 gcgaggagcc ggatgagaag aaactctcac gtccggttct gtagtagagg tggaattaag
     1081 aaaaaaccat caactataac cccaaaagaa ccagattctg taaacaacat agaggaagaa
     1141 tgaagggaat atcttatcgg gggaatcgta tttgtttcgg aagatatgct ctcagcacga

LOCUS       HM041480                1153 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Opuntia microdasys ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041480.1  GI:330887239
SOURCE      chloroplast Opuntia microdasys
  ORGANISM  Opuntia microdasys
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1153)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1153)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1153
                     /organism="Opuntia microdasys"
                     /mol_type="genomic DNA"
     gene            <1..>1153
     misc_feature    <1079..>1153
                     /note="similar to ribosomal protein L16"
        1 gcccatagta tgaagtatga actaataact atagaactaa taaccaactc atcgcatcac
       61 attatctgga tccaaagaag cagtcaagat aggatatttt ggtcctatca ttgcagcaac
      121 tgaatttttt ttttcataaa caagaaatca aatgagttgt caagcaaaag aaaaaaaaaa
      181 aaaaaaatat actttaaggg ggggggatgg ggataaaggg aaaggggaaa aaaaaaaaaa
      241 aatgaatcta aatgatatac aattccacta tgaaaggtct ttgaatcata tcaaaaaaaa
      301 caatgtaata aagcaggaat acagattccc acataattat ctgatatgaa tcttttcata
      361 aaaaaaaaaa aaaagtaaga gcctccggcc aataaagact aagagggttg gctcaagaac
      421 aaagttcatt aagggctcca ttgtagaatt cagacctaat cattaatcaa gaggcgatgg
      481 gaacgatgta atccatgaat acagaagatt caattgaaaa agaatcctaa tgattcattg
      541 ggaaggatgg cggaacgaac cagagaccaa ttcatctatt ctgaaaagtg aaaaactaat
      601 cctataaaac taaaatagat attgaaagag taaatattcg cccgcgaaaa ttcctttttt
      661 attaaattgc tcacatttta ttttagcaat gcaatctaat aaaatatatc tatacaaaaa
      721 aatatagaca aactatatat ataatatatt tcaaatttcc ttatatatcc taatataaaa
      781 atatctaata aattagatga atatcaaaga atctattgat ttagtgtatt attaaatgta
      841 tatcttaatt caatattatt attctattca tttttattat tcatttttat tcattttcaa
      901 atttagaata tattaatcta tatattaatt tataattcta ttctaattcg aattcaattt
      961 ttaaatattc atattcaatt aaaattgaaa ttttttcatt cgcgaggagc cggatgagaa
     1021 gaaactctca cgtccggttc tgtagtagag gtggaattaa gaaaaaacca tcaactataa
     1081 ccccaaaaga accagattct gtaaacaaca tagaggaaga atgaagggaa tatcttatcg
     1141 ggggatatcg tat

LOCUS       HM041479                1197 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Opuntia megasperma ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041479.1  GI:330887238
SOURCE      chloroplast Opuntia megasperma
  ORGANISM  Opuntia megasperma
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1197)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1197)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1197
                     /organism="Opuntia megasperma"
                     /mol_type="genomic DNA"
     gene            <1..>1197
     misc_feature    <1098..>1197
                     /note="similar to ribosomal protein L16"
        1 gatatacgaa aagtaagagc ccatagtatg aagtatgaac taataactat agaactaata
       61 accaactcat cgcatcacat tatccggatc caaagaagca gtcaagatag gatattttgg
      121 tcctatcatt gcagcaactg aatttttttt tcataaacaa gaaatcaaat gagttgtcaa
      181 gcaaaagaaa aaaaaaaaag aaaaatatac tttaaaggag ggggatgcgg ataaatggaa
      241 aggcgaaaga aagaaaaaaa tgaatctaaa tgatatacga ttccnctatg taaggtcttt
      301 gaatcatatc ataaaagaca atgtaataaa gcatgaatac agattcacac ataattatct
      361 gatatgaatc tattcataga aaaaagaaaa aagtaagagc ctccgggcca ataaagacta
      421 agagggttgg ctcaagaaca aagttcatta agagctccat tgtagaattc agacctaatc
      481 attaatcaag aagcgatggg aacgatgtaa tccatgaata cagaagattc aattgaaaaa
      541 gaatcctaat gattcattgg gaaggatggc ggaacgaacc agagaccaat tcatctattc
      601 tgaaaagtga taaactaatc ctataaaact aaaatagata ttgaaagagt aaatattcgc
      661 ccgcgaaaat tcctttttta ttaaattgct cacattttat tttagcaatg caatctaata
      721 aaatatatct atacaaaaaa atatagacaa actatatata taatatattt caaatttcct
      781 tatatatcct aatataaaaa tatctaataa attagatgaa tatcaaagaa tctattgatt
      841 tagtgtatta ttaaatgtat atcttaattc aatattttta ttctattcat ttttattatt
      901 catttttatt cattttcaaa tttagaatat attaatctat atattaattt agaattctat
      961 tctaattcga attcaatttt taaatattca tattcaatta aaattgaaat tttttcattc
     1021 gcgaggagcc ggatgagaag aaactctcac gtccggttct gtagtagagg tggaattaag
     1081 aaaaaaccat caactataac cccaaaagaa ccagattctg taaacaacat agaggaagaa
     1141 tgaagggaat atcttatcgg gggaatcgta tttgtttcgg aagatatgct ctcagca

LOCUS       HM041478                1187 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Opuntia macbridei ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041478.1  GI:330887237
SOURCE      chloroplast Opuntia macbridei
  ORGANISM  Opuntia macbridei
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1187)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1187)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1187
                     /organism="Opuntia macbridei"
                     /mol_type="genomic DNA"
     gene            <1..>1187
     misc_feature    <1090..>1187
                     /note="similar to ribosomal protein L16"
        1 aaaagtaaga gcccatagta tgaagtatga actaataact atagaactaa taaccaactc
       61 atcgcatcac attatctgga tccaaagaag cagtcaagat aggatatttt ggtcctatca
      121 ttgcagcaac tgaatttttt tttcataaac aagaaatcaa atgagttgtc aagcaaaaga
      181 aaaaaaaaaa agaaaaatat acattaaagg agggggatgc ggataaatgg aaaggcgaaa
      241 gaaagaaaaa aatgaatcta aatgatatac gattccacta tgtaaggtct ttgaatcata
      301 tcataaaaga caatgtaata aagcatgaat acagattcac acataattat ctgatatgaa
      361 tctattcata gaaaaaagaa aaaagtaaga gcctccggcc aataaagact aagagggttg
      421 gctcaagaac aaagttcatt aagggctcca tttgtagaat tcagacctaa tcattaatca
      481 agaagcgatg ggaacgatgt aattccatga atacagaaga ttcaattgaa aaagatccta
      541 atgattcatt gggaaggatg gcggacgaac cagagaccaa ttcatctatt ctgaaaagtg
      601 ataaactaat cctataaaac taaaatagat attgaaagag taaatattcg cccgcgaaaa
      661 ttcctttttt attaaattgc tcacatttta ttttagcaat gcaatctaat aaaatatatc
      721 tatacaaaaa aaatatagac aaactatata tataatatat ttcaaatttc cttatatatc
      781 ctaatataaa aatatctaat aatttagatg aatatcaaag aatctattga tttagtgtat
      841 tattaaatgt atatcttaat tcaatattat tattctattc atttttatta ttcattttta
      901 ttcattttca aatttagaat atattaatct atatattaat ttagaattct attctaattc
      961 gaattcaatt tttaaatatt catattcaat taaaattgaa attttttcat tcgcgaggag
     1021 ccggatgaga agaaactctc acgtccggtt ctgtagtaga ggtggaatta agaaaaaacc
     1081 atcaactata accccaaaag aaccagattc tgtaaacaac atagaggaag aatgaaggga
     1141 atatcttatc gggggaatcg tatttgtttc ggaagatatg ctctcag

LOCUS       HM041477                1197 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Cylindropuntia leptocaulis ribosomal protein L16-like (rpl16) gene,
            partial sequence; chloroplast.
VERSION     HM041477.1  GI:330887236
SOURCE      chloroplast Cylindropuntia leptocaulis
  ORGANISM  Cylindropuntia leptocaulis
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1197)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1197)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1197
                     /organism="Cylindropuntia leptocaulis"
                     /mol_type="genomic DNA"
     gene            <1..>1197
     misc_feature    <1096..>1197
                     /note="similar to ribosomal protein L16"
        1 ttgtgngnct cctgaagagt aggagcccct agtatgaagt atgaactaat aactatagaa
       61 ctaataacca actcatcgca tcacattatc cggatccaaa aaagcagtca agataggata
      121 ttttggtcct atcattgcag caactgaatt ttttttttca taaacaagaa atcgaatgag
      181 ttgtcaagca aaagaaaaaa aaagaaaaat atactttaaa ggagggggat gcggataaat
      241 ggaaaggcga aagaaagaaa aaaatgaatc taaatgatat aggattcccc tatgtaaggt
      301 ctttgaatca tatcataaaa gacaatgtaa taaagcatga atacagattc ccacataatt
      361 atctgatatg aatctattcc tagaaaaaag aaaaaagtaa gagcctccgg ccaataaaga
      421 ctaagagggt tggctcaaga acaaagttca ttaaaagctc ccttgtagaa ttcagaccta
      481 atcnttaatc aagaagcgat gggaacgatg taatccctga atacagaaga ttcaattgaa
      541 aaagaatcct aatgattcat tgggaaggat ggcggaacga accagagacc aattcatcta
      601 ttctgaaaag tgataaacta atcctataaa actaaaatag atattgaaag agtaaatatt
      661 cgcccgcgaa atttcctttt ttattaaatt gctcatattt ttttttagca atgcaatcta
      721 ataaaatata tctctacaaa aaaacataga caaactatat atatatatat atataatatt
      781 tcaaattccc ttatatatcc aaatataaaa atatctaata aattagatga atatcaaaga
      841 atctattgat ttagtgtatt attaaatgta tatcttaatt caatattatt attctattca
      901 tttttattca ttttcaaatt tataatatat taatctatat attaatatag aattctattc
      961 taattcgaat tcaattttta aatattcata ttcaattaaa attgaaattt tttcattcgc
     1021 gaggagccgg atgagaagaa actctcatgt ccggttctgt agtagagatg gaattaagaa
     1081 aaaaccatca actataaccc caaaagaacc ggattctgta aacaacatag aggaagaatg
     1141 aagggaatat cttgtcgggg gaatcgatnn gtncggaant natgntcgcn gcgcgcc

LOCUS       HM041476                1205 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Opuntia lasiacantha ribosomal protein L16-like (rpl16) gene,
            partial sequence; chloroplast.
VERSION     HM041476.1  GI:330887235
SOURCE      chloroplast Opuntia lasiacantha
  ORGANISM  Opuntia lasiacantha
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1205)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1205)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1205
                     /organism="Opuntia lasiacantha"
                     /mol_type="genomic DNA"
     gene            <1..>1205
     misc_feature    <1103..>1205
                     /note="similar to ribosomal protein L16"
        1 gggcccnnna ngangaaaag tagagcccat agtatgaagt atgaactaat aactatagaa
       61 ctaataacca actcatcgca tcacattatc tggatccaaa gaagcagtca agataggata
      121 ttttggtcct atcattgcag caactgaatt ttttttttca taaacaagaa atcaaatgag
      181 ttgtcaagca aaagaaaaaa aaaaagaaaa atatccttta aaggaggggg atgcggataa
      241 atggaaaggc gaaagaaaga aaaaaatgaa tctaaatgat atacgattcc cctatgtaag
      301 gtctttgaat catatcataa aagacaatgt aataaagcat gaatacagat tcccccataa
      361 ttatctgata tgaatctatt cctagaaaaa agaaaaaagt aagagcctcc ggccaataaa
      421 gactaagagg gttggctcaa gaacaaagtt cattaagggc tccattgtag aattcagacc
      481 taatcattaa tcaagaggcg atgggaacga tgtaatccat gaatacagaa gattcaattg
      541 aaaaagaatc ctaatgattc attgggaagg atggcggaac gaaccagaga ccaattcatc
      601 tattctgaaa agtgataaac taatcctata aaactaaaat agatattgaa agagtaaata
      661 ttcgcccgcg aaaattcctt ttttattaaa ttgctcacat tttattttag caatgcaatc
      721 taataaaatc tatctataca aaaaaatata gacaaactat atatataata tatttcaaat
      781 ttccttatat atcctaatat aaaaatatct aataaattag atgaatatca aagaatctat
      841 tgatttagtg tattattaaa tgtatatctt aattcaatat tattattcta ttcattttta
      901 ttattcattt ttattcattt tcaaatttag aatatattaa tctatatatt aatttataat
      961 tctattctaa ttcgaattca atttttaaat attcatattc aattaaaatt gaaatttttt
     1021 cattcgcgag gagccggatg agaagaaact ctcacgtccg gttctgtagt agaggtggaa
     1081 ttaagaaaaa accatcaact ataaccccaa aagaaccaga ttctgtaaac aacatagagg
     1141 aagaatgaag ggaatatctt atcgagggaa tcgtatttgt ttcggaagat agtnctngcn
     1201 nggtg

LOCUS       HM041474                1163 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Opuntia helleri ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041474.1  GI:330887233
SOURCE      chloroplast Opuntia helleri
  ORGANISM  Opuntia helleri
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1163)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1163)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1163
                     /organism="Opuntia helleri"
                     /mol_type="genomic DNA"
     gene            <1..>1163
     misc_feature    <1081..>1163
                     /note="similar to ribosomal protein L16"
        1 gagcccatag tatgaagtat gaactaataa ctatagaact aataaccaac tcatcgcatc
       61 acattatccg gatccaaaga agcagtcaag ataggatatt ttggtcctat cattgcagca
      121 actgaatttt tttttcataa acaagaaatc aaatgagttg tcaagcaaaa gaaaaaaaaa
      181 aaagaaaaat atacattaaa ggagggggat gcggataaat ggaaaggcga aagaaagaaa
      241 aaaatgaatc taaatgatat acgattccnc tatgtaaggt ctttgaatca tatcataaaa
      301 gacaatgtaa taaagcatga atacagattc acacataatt atctgatatg aatctattca
      361 tagaaaaaag aaaaaagtaa gagcctccgg ccaataaaga ctaagagggt tggctcaaga
      421 acaaagttca ttaagggctc cattgtagaa ttcagaccta atcattaatc aagaagcgat
      481 gggaacgatg taatccatga atacagaaga ttcaattgaa aaagaatcct aatgattcat
      541 tgggaaggat ggcggaacga accagagacc aattcatcta ttctgaaaag tgataaacta
      601 atcctataaa actaaaatag atattgaaag agtaaatatt cgcccgcgaa aattcctttt
      661 ttattaaatt gctcacattt tattttagca atgcaatcta ataaaatata tctatacaaa
      721 aaaatataga caaactatat atataatata tttaaaattt ccttatatat cctaatataa
      781 aaatatctaa taaattagat gaatatcaaa gaatctattg atttagtgta ttattaaatg
      841 tatatcttaa ttcaatattt ttattctatt catttttatt attcattttt attcattttc
      901 aaatttagaa tatattaatc tatatattaa tttagaattc tattctaatt cgaattcaat
      961 ttttaaatat tcatattcaa ttaaaattga aattttttca ttcgcgagga gccggatgag
     1021 aagaaactct cacgtccggt tctgtagtag aggtggaatt aagaaaaaac catcaactat
     1081 aaccccaaaa gaaccagatt ctgtaaacaa catagaggaa gaatgaaggg aatatcttat
     1141 cgggggaatc gtatttgttt cgg

LOCUS       HM041473                1203 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Opuntia excelsa ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041473.1  GI:330887232
SOURCE      chloroplast Opuntia excelsa
  ORGANISM  Opuntia excelsa
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1203)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1203)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1203
                     /organism="Opuntia excelsa"
                     /mol_type="genomic DNA"
     gene            <1..>1203
     misc_feature    <1103..>1203
                     /note="similar to ribosomal protein L16"
        1 ccgnncnttg nnanacagaa nagtagagcc cnttntntga agtatgaact aatcactatt
       61 gaactaatcc ccnactcatc gcatcacatt atctggatcc aaagaagcag tcaagatagg
      121 atattttggt cctatcattg cagcaactga attttttttt tcctaaacaa gaaatcaaat
      181 gagttgtcaa gcaaaagaaa aaaaagaaaa atatacatta aaggaggggg atgcggataa
      241 atggaaaggc gaaagaaaga aaaaaatgaa tctaaatgat atacgattcc cctatgtaag
      301 gtctttgaat catatcataa aagacaatgt aataaagcat gaatacagat tcccacataa
      361 ttatctgata tgaatctatt catagaaaaa agaaaaaagt aagagcctcc ggccaataaa
      421 gactaaaagg gttggctcaa gaacaaagtt cattaagggc tccattgtaa aattcagacc
      481 taatcattaa tcaagaggcg atgggaacga tgtaatccat gaatacagaa gattcaattg
      541 aaaaagaatc ctaatgattc attgggaagg atggcggaac gaaccagaga ccaattcatc
      601 tattctgaaa agtgataaac taatcctata aaactaaaat agatattgaa agagtaaata
      661 ttcgcccgcg aaaattcctt ttttattaaa ttgctcacat tttattttag caatgcaatc
      721 taataaaatc tatctataca aaaaaatata gacaaactat atatataata tatttcaaat
      781 ttccttatat atcctaatat aaaaatatct aataaattag atgaatatca aagaatctat
      841 tgatttagtg tattattaaa tgtatatctt aattcaatat tattattcta ttcattttta
      901 ttattcattt ttattcattt tcaaatttag aatatattaa tctatatatt aatttagaat
      961 tctattctaa ttcgaattca atttttaaat attcatattc aattaaaatt gaaatttttt
     1021 cattcgcgag gagccggatg agaagaaact ctcacgtccg gttctgtagt agaggtggaa
     1081 ttaagaaaaa accatcaact ataaccccaa aagaaccaga ttctgtaaac aacatagagg
     1141 aagaatgaag ggaatatctt atcgggggaa tcgtatngtg cnggctngtg cancgcgggc
     1201 nng

LOCUS       HM041472                1182 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Opuntia echios ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041472.1  GI:330887231
SOURCE      chloroplast Opuntia echios
  ORGANISM  Opuntia echios
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1182)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1182)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1182
                     /organism="Opuntia echios"
                     /mol_type="genomic DNA"
     gene            <1..>1182
     misc_feature    <1085..>1182
                     /note="similar to ribosomal protein L16"
        1 gtaagagccc atagtatgaa gtatgaacta ataactatag aactaataac caactcatcg
       61 catcacatta tccggatcca aagaagcagt caagatagga tattttggtc ctatcattgc
      121 agcaactgaa tttttttttc ataaacaaga aatcaaatga gttgtcaagc aaaagaaaaa
      181 aaaaaaaaaa aaatatacat taaaggaggg ggatgcggat aaatggaaag gcgaaagaaa
      241 gaaaaaaatg aatctaaatg atatacgatt ccactatgta aggtctttga atcatatcat
      301 aaaagacaat gtaataaagc atgaatacag attcacacat aattatctga tatgaatcta
      361 ttcatagaaa aaagaaaaaa gtaagagcct ccggccaata aagactaaga ggttgggctc
      421 aagaacaaag ttcattaagg gctccattgt agaattcaga cctaatcatt aatcaagaag
      481 cgatgggaac gatgtaatcc atgaatacag aagattcaat tgaaaaagaa tcctaatgat
      541 tcattgggaa ggatggcgga acgaaccaga gaccaattca tctattctga aaagtgataa
      601 actaatccta taaaactaaa atagatattg aaagagtaaa tattcgcccg cgaaaattcc
      661 ttttttatta aattgctcac attttatttt agcaatgcaa tctaataaaa tatatctata
      721 caaaaaaata tagacaaact atatatataa tatatttcaa atttccttat atatcctaat
      781 ataaaaatat ctaataaatt agatgaatat caaagaatct attgatttag tgtattatta
      841 aatgtatatc ttaattcaat attattattc tattcatttt tattattcat ttttattcat
      901 tttcaaattt agaatatatt aatctatata ttaatttaga attctattct aattcgaatt
      961 caatttttaa atattcatat tcaattaaaa ttgaaatttt ttcattcgcg aggagccgga
     1021 tgagaagaaa ctctcacgtc cggttctgta gtagaggtgg aattaagaaa aaaccatcaa
     1081 ctataacccc aaaagaacca gattctgtaa acaacataga ggaagaatga agggaatatc
     1141 ttatcggggg aatcgtattt gtttcggaag atggctacta ta

LOCUS       HM041469                1189 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Nopalea sp. THH-2011 ribosomal protein L16-like (rpl16) gene,
            partial sequence; chloroplast.
VERSION     HM041469.1  GI:330887228
SOURCE      chloroplast Opuntia sp. THH-2011
  ORGANISM  Opuntia sp. THH-2011
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1189)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1189)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1189
                     /organism="Opuntia sp. THH-2011"
                     /mol_type="genomic DNA"
     gene            <1..>1189
     misc_feature    <1096..>1189
                     /note="similar to ribosomal protein L16"
        1 atatacgaaa aagtagagcc catagtatga agtatgaact aataactata gaactaataa
       61 ccaactcatc gcatcacatt atctggatcc aaagaagcag tcaagatagg atattttggt
      121 cctatcattg cagcaactga attttttttt cataaacaag aaatcaaatg agttgtcaag
      181 caaaagaaaa aaaaaaaaga aaaatatact ttaagggagg gggatgcgga taaatggaaa
      241 ggcgaaagaa agaaaaaaat gaatctaaat gatatacgat tccactatgt aaggtctttg
      301 aatcatatca taaaagacaa tgtaataaag catgaataca gattcacaca taattatctg
      361 gtatgaatct attcatagaa aaaagaaaaa agtaagaccc tccggccaat aaagactaag
      421 agggttggct caagaacaaa gttcattaag ggctccattg tagaattcag acctaatcat
      481 taatcaagaa gcgatgggaa cgatgtaatc catgaataca gaagattcaa ttgaaaaaga
      541 atcctaatga ttcattggga aggatggcgg aacgaaccag agaccaattc atctattctg
      601 aaaagtgata aactaatcct ataaaactaa aatagatatt gaaagagtaa atattcgccc
      661 gcgaaaattc cttttttatt aaattgctca cattttattt tagcaatgca atctaataaa
      721 atatatctat acaaaaaaat atagacaaac tatatatata atatatttca aatttcctta
      781 tatatcctaa tataaaaata tctaataaat tagatgaata tcaaagaatc tattgattta
      841 gtgtattatt aaatgtatat cttaattcaa tattattatt ctattcattt ttattattca
      901 tttttattca ttttcaaatt tagaatatat taatctatat attaatttag aattctattc
      961 taattcgaat tcaattttta aatattcata ttcaattaaa attgaaattt tttcattcgc
     1021 gaggagccgg atgagaagaa actctcacgt ccggttctgt agtagaggtg gaattaagaa
     1081 aaaaccatca actataaccc caaaagaacc agattctgta aacaacatag aggaagaatg
     1141 aagggaatat cttatcgggg gaatcgtatt tgtttcggaa gatatgctc

LOCUS       HM041468                1202 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Nopalea lutea ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041468.1  GI:330887227
SOURCE      chloroplast Opuntia lutea
  ORGANISM  Opuntia lutea
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1202)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1202)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1202
                     /organism="Opuntia lutea"
                     /mol_type="genomic DNA"
     gene            <1..>1202
     misc_feature    <1099..>1202
                     /note="similar to ribosomal protein L16"
        1 gatatacgaa aaagtaagag cccatagtat gaagtatgaa ctaataacta tagaactaat
       61 aaccaactca tcgcatcaca ttatctggat ccaaagaagc agtcaagata ggatattttg
      121 gtcctatcat tgcagcaact gaattttttt ttcataaaca agaaatcaaa tgagttgtca
      181 agcaaaagaa aaaaaaaaaa gaaaaatata ctttaaggga gggggatgcg gataaatgga
      241 aaggcgaaag aaagaaaaaa atgaatctaa atgatatacg attcccccta tgtaaggtct
      301 ttgaatcata tcataaaaga caatgtaata aagcatgaat acagattcac acataattat
      361 ctgatatgaa tctattcata gaaaaaagaa aaaagtaaga ccctccggcc aataaagact
      421 aagagggttg gctcaagaac aaagttcatt aagggctcca ttgtagaatt cagacctaat
      481 cattaatcaa gaagcgatgg gaacgatgta atccatgaat acagaagatt caattgaaaa
      541 agaatcctaa tgattcattg ggaaggatgg cggaacgaac cagagaccaa ttcatctatt
      601 ctgaaaagtg ataaactaat cctataaaac taaaatagat attgaaagag taaatattcg
      661 cccgcgaaaa ttcctttttt attaaattgc tcacatttta ttttagcaat gcaatctaat
      721 aaaatatatc tatacaaaaa aatatagaca aactatatat ataatatatt tcaaatttcc
      781 ttatatatcc taatataaaa atatctaata aattagatga atatcaaaga atctattgat
      841 ttagtgtatt attaaatgta tatcttaatt caatattatt attctattca tttttattat
      901 tcatttttat tcattttcaa atttagaata tattaatcta tatattaatt tagaattcta
      961 ttctaattcg aattcaattt ttaaatattc atattcaatt aaaattgaaa ttttttcatt
     1021 cgcgaggagc cggatgagaa gaaactctca cgtccggttc tgtagtagag gtggaattaa
     1081 gaaaaaacca tcaactataa ccccaaaaga accagattct gtaaacaaca tagaggaaga
     1141 atgaagggaa tatcttatcg ggggaatcgt atttgtttcg gaagatatgc tctcaggcac
     1201 ga

LOCUS       HM041467                1199 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Nopalea karwinskiana ribosomal protein L16-like (rpl16) gene,
            partial sequence; chloroplast.
VERSION     HM041467.1  GI:330887226
SOURCE      chloroplast Opuntia karwinskiana
  ORGANISM  Opuntia karwinskiana
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1199)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1199)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1199
                     /organism="Opuntia karwinskiana"
                     /mol_type="genomic DNA"
     gene            <1..>1199
     misc_feature    <1098..>1199
                     /note="similar to ribosomal protein L16"
        1 gtgatatcga aaaagtagag cccatagtat gaagtatgaa ctaataacta tagaactaat
       61 aaccaactca tcgcatcaca ttatctggat ccaaagaagc agtcaagata ggatattttg
      121 gtcctatcat tgcagcaact gaattttttt ttcataaaca agaaatcaaa tgagttgtca
      181 agcaaaagaa aaaaaaaaaa gaaaaattta ctttaaggga gggggatgcg gataaatgga
      241 aaggcgaaag aaagaaaaaa atgaatctaa atgatatacg attcccctat gtagggtctt
      301 tgaatcatat cataaaaaac aatgtaataa agcatgaata cagattcccc cataattatc
      361 tggtatgaat cttttcatag aaaaaaaaaa aaagtaagag cctccggcca ataaaaacta
      421 aaagggttgg ctcaagaaca aagttcatta agggctccat tgtagaattc agacctaatc
      481 nttaatcaag aagcgatggg aacgatgtaa tccatgaata cagaagattc aattgaaaaa
      541 gaatcctaat gattcattgg gaaggatggc ggaacgaacc agagaccaat tcatctattc
      601 tgaaaagtga taaactaatc ctataaaact aaaatagata ttgaaagagt aaatattcgc
      661 ccgcgaaaat tcctttttta ttaaattgct cacattttat tttagcaatg caatctaata
      721 aaatatatct atacaaaaaa atatagacaa actatatata taatatattt caaatttcct
      781 tatatatcct aatataaaaa tatctaataa attagatgaa tatcaaagaa tctattgatt
      841 tagtgtatta ttaaatgtat atcttaattc aatattatta ttctattcat ttttattatt
      901 catttttatt cattttcaaa tttagaatat attaatctat atattaattt agaattctat
      961 tctaattcga attcaatttt taaatattca tattcaatta aaattgaaat tttttcattc
     1021 gcgaggagcc ggatgagaag aaactctcac gtccggttct gtagtagagg tggaattaag
     1081 aaaaaaccat caactataac cccaaaagaa ccagattctg taaacaacat agaggaagaa
     1141 tgaagggaat atcttatgcg ggggaatcgt attgtttcgg aagatatgct ctgcggccc

LOCUS       HM041466                1205 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Nopalea gaumeri ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041466.1  GI:330887225
SOURCE      chloroplast Nopalea gaumeri
  ORGANISM  Nopalea gaumeri
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1205)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1205)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1205
                     /organism="Nopalea gaumeri"
                     /mol_type="genomic DNA"
     gene            <1..>1205
     misc_feature    <1103..>1205
                     /note="similar to ribosomal protein L16"
        1 gctgtgatat acgaaanagt aagagcccat agtatgaagt atgaactaat aactatagaa
       61 ctaataacca actcatcgca tcacattatc tggatccaaa gaagcagtca agataggata
      121 ttttggtcct atcattgcag caactgaatt tttttttcat aaacaagaaa tcaaatgagt
      181 tgtcaagcaa aagaaaaaaa aaaaaaaaaa tatacattaa aggaggggga tgcggataaa
      241 tggaaaggcg aaagaaagaa aaaaatgaat ctaaatgata tacgattcca ctatgtaagg
      301 tctttgaatc atatcataaa agacaatgta ataaagcatg aatacagatt cacacataat
      361 tatctgaata tgaatctatt catagaaaaa agaaaaaagt aagaccctcc ggccaataaa
      421 gactaaaggg gttggctcaa gaacaaagtt cattaagggc tccattgtag aattcagacc
      481 taatcattaa tcaagaagcg atgggaacga tgtaatccat gaatacagaa gattcaattg
      541 aaaaagaatc ctaatgattc attgggaagg atggcggaac gaaccagaga ccaattcatc
      601 tattctgaaa agtgataaac taatcctata aaactaaaat agatattgaa agagtaaata
      661 ttcgcccgcg aaaattcctt ttttattaaa ttgctcacat tttattttag caatgcaatc
      721 taataaaata tatctataca aaaaaatata gacaaactat atatataata tatttcaaat
      781 ttccttatat atcctaatat aaaaatatct aataaattag atgaatatca aagaatctat
      841 tgatttagtg tattattaaa tgtatatctt aattcaatat tattattcta ttcattttta
      901 ttattcattt ttattcattt tcaaatttag aatatattaa tctatatatt aatttagaat
      961 tctattctaa ttcgaattca atttttaaat attcatattc aattaaaatt gaaatttttt
     1021 cattcgcgag gagccggatg agaagaaact ctcacgtccg gttctgtagt agaggtggaa
     1081 ttaagaaaaa accatcaact ataaccccaa aagaaccaga ttctgtaaac aacatagagg
     1141 aagaatgaag ggaatatctt atcgggggaa tcgtatttgt ttcggaagat atgctctcag
     1201 cacga

LOCUS       HM041465                1190 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Nopalea dejecta ribosomal protein L16-like (rpl16) gene, partial
            sequence; chloroplast.
VERSION     HM041465.1  GI:330887224
SOURCE      chloroplast Opuntia dejecta
  ORGANISM  Opuntia dejecta
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1190)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1190)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1190
                     /organism="Opuntia dejecta"
                     /mol_type="genomic DNA"
     gene            <1..>1190
     misc_feature    <1096..>1190
                     /note="similar to ribosomal protein L16"
        1 tgatatacga aanagtaaga gcccatagta tgaagtatga actaataact atagaactaa
       61 taaccaactc atcgcatcac attatctgga tccaaagaag cagtcaagat aggatatttt
      121 ggtcctatca ttgcagcaac tgaatttttt tttcataaac aagaaatcaa atgagttgtc
      181 aagcaaaaga aaaaaaaaaa aaaaaatata ctttaangga gggggatgcg gataaatgga
      241 aaggcgaaag aaagaaaaaa atgaatctaa atgatatacg attccactat gtaaggtctt
      301 tgaatcatat cataaaagac aatgtaataa agcatgaata cagattcaca cataattatc
      361 tgtatgatct attcatagaa aaaagaaaaa agtaagagcc tccggccaat aaagactaag
      421 agggttggct caagaacaaa gttcattaag ggctccattg tagaattcag acctaatcat
      481 taatcaagaa gcgatgggaa cgatgtaatc catgaataca gaagattcaa ttgaaaaaga
      541 atcctaatga ttcattggga aggatggcgg aacgaaccag agaccaattc atctattctg
      601 aaaagtgata aactaatcct ataaaactaa aatagatatt gaaagagtaa atattcgccc
      661 gcgaaaattc cttttttatt aaattgctca cattttattt tagcaatgca atctaataaa
      721 atatatctat acaaaaaaat atagacaaac tatatatata atatatttca aatttcctta
      781 tatatcctaa tataaaaata tctaataaat tagatgaata tcaaagaatc tattgattta
      841 gtgtattatt aaatgtatat cttaattcaa tattattatt ctattcattt ttattattca
      901 tttttattca ttttcaaatt tagaatatat taatctatat attaatttag aattctattc
      961 taattcgaat tcaattttta aatattcata ttcaattaaa attgaaattt tttcattcgc
     1021 gaggagccgg atgagaagaa actctcacgt ccggttctgt agtagaggtg gaattaagaa
     1081 aaaaccatca actataaccc caaaagaacc agattctgta aacaacatag aggaagaatg
     1141 aagggaatat cttatcgggg gaatcgtatt tgtttcggaa gatatgctct

LOCUS       HM041464                1184 bp    DNA     linear   PLN 03-MAY-2011
DEFINITION  Nopalea cochenillifera ribosomal protein L16-like (rpl16) gene,
            partial sequence; chloroplast.
VERSION     HM041464.1  GI:330887223
SOURCE      chloroplast Opuntia cochenillifera
  ORGANISM  Opuntia cochenillifera
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 1184)
  AUTHORS   Hernandez-Hernandez,T., Hernandez,H.M., De-Nova,J.A., Puente,R.,
            Eguiarte,L.E. and Magallon,S.
  TITLE     Phylogenetic relationships and evolution of growth form in
            Cactaceae (Caryophyllales, Eudicotyledoneae)
  JOURNAL   Am. J. Bot. 98 (1), 44-61 (2011)
   PUBMED   21613084
REFERENCE   2  (bases 1 to 1184)
  AUTHORS   Hernandez-Hernandez,T., Magallon,S.A., Hernandez,H.M., De-Nova,A.,
            Puente,R. and Eguiarte,L.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (17-MAR-2010) Departamento de Botanica, Instituto de
            Biologia, Universidad Nacional Autonoma de Mexico, 3er Circuito de
            Ciudad Universitaria, Ciudad Universitaria, Coyoacan, Distrito
            Federal C.P. 04510, Mexico
FEATURES             Location/Qualifiers
     source          1..1184
                     /organism="Opuntia cochenillifera"
                     /mol_type="genomic DNA"
     gene            <1..>1184
     misc_feature    <1102..>1184
                     /note="similar to ribosomal protein L16"
        1 gctgtgatat acgaaaaagt aagagcccat agtatgaagt atgaactaac aactatagaa
       61 ctaataacca actcatcgca tcacattatc tggatccaaa gaagcagtca agataggata
      121 ttttggtcct atcattgcag caactgaatt tttttttcat aaacaagaaa tcaaatgagt
      181 tgtcaagcaa aagaaaaaaa aaaaaaaaaa tatactttaa aggaggggga tgcggataaa
      241 tggaaaggcg aaagaaagaa aaaaatgaat ctaaatgata tacgattcca ctatgtaagg
      301 tctttgaatc atatcataaa agacaatgta ataaagcatg aatacagatt cccacataat
      361 tatctgatat gaatctattc atagaaaaaa gaaaaaagta agagcctccg gccaataaag
      421 actaagaggg ttggctcaag aacaaagttc attaagggct ccattgtaga attcagacct
      481 aatcattaat caagaagcga tgggaacgat gtaatccatg aatacagaag attcaattga
      541 aaaagaatcc taatgattca ttgggaagga tggcggaacg aaccagagac caattcatct
      601 attctgaaaa gtgataaact aatcctataa aactaaaata gatattgaaa gagtaaatat
      661 tcgcccgcga aaattccttt tttattaaat tgctcacatt ttattttagc aatgcaatct
      721 aataaaatat atctatacaa aaaaatatag acaaactata tatataatat atttcaaatt
      781 tccttatata tcctaatata aaaatatcta ataaattaga tgaatatcaa agaatctatt
      841 gatttagtgt attattaaat gtatatctta attcaatatt attattctat tcatttttat
      901 tattcatttt tattcatttt caaatttaga atatattaat ctatatatta atttagaatt
      961 ctattctaat tcgaattcaa tttttaaata ttcatattca attaaaattg aaattttttc
     1021 attcgcgagg agccggatga gaagaaactc tcacgtccgg ttctgtagta gaggtggaat
     1081 taagaaaaaa ccatcaacta taaccccaaa agaaccagat tctgtaaaca acatagagga
     1141 agaatgaagg gaatatctta tcgggggaat cgtatttgtt tcgg

LOCUS       AY851612                 892 bp    DNA     linear   PLN 10-APR-2007
DEFINITION  Opuntia subulata rpl16 gene, intron; chloroplast.
VERSION     AY851612.1  GI:57240072
SOURCE      chloroplast Austrocylindropuntia subulata
  ORGANISM  Austrocylindropuntia subulata
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 892)
  AUTHORS   Butterworth,C.A. and Wallace,R.S.
  TITLE     Molecular Phylogenetics of the Leafy Cactus Genus Pereskia
  JOURNAL   Syst. Bot. 30 (4), 800-808 (2005)
REFERENCE   2  (bases 1 to 892)
  AUTHORS   Butterworth,C.A. and Wallace,R.S.
  TITLE     Direct Submission
  JOURNAL   Submitted (10-DEC-2004) Desert Botanical Garden, 1201 North Galvin
            Parkway, Phoenix, AZ 85008, USA
FEATURES             Location/Qualifiers
     source          1..892
                     /organism="Austrocylindropuntia subulata"
                     /mol_type="genomic DNA"
     gene            <1..>892
     intron          <1..>892
        1 cattaaagaa gggggatgcg gataaatgga aaggcgaaag aaagaaaaaa atgaatctaa
       61 atgatatacg attccactat gtaaggtctt tgaatcatat cataaaagac aatgtaataa
      121 agcatgaata cagattcaca cataattatc tgatatgaat ctattcatag aaaaaagaaa
      181 aaagtaagag cctccggcca ataaagacta agagggttgg ctcaagaaca aagttcatta
      241 agagctccat tgtagaattc agacctaatc attaatcaag aagcgatggg aacgatgtaa
      301 tccatgaata cagaagattc aattgaaaaa gatcctaatg atcattggga aggatggcgg
      361 aacgaaccag agaccaattc atctattctg aaaagtgata aactaatcct ataaaactaa
      421 aatagatatt gaaagagtaa atattcgccc gcgaaaattc cttttttatt aaattgctca
      481 tattttattt tagcaatgca atctaataaa atatatctat acaaaaaaat atagacaaac
      541 tatatatata taatatattt caaatttcct tatataccca aatataaaaa tatctaataa
      601 attagatgaa tatcaaagaa tctattgatt tagtgtatta ttaaatgtat atcttaattc
      661 aatattatta ttctattcat ttttattcat tttcaaattt ataatatatt aatctatata
      721 ttaatttata attctattct aattcgaatt caatttttaa atattcatat tcaattaaaa
      781 ttgaaatttt ttcattcgcg aggagccgga tgagaagaaa ctctcatgtc cggttctgta
      841 gtagagatgg aattaagaaa aaaccatcaa ctataacccc aagagaacca ga

LOCUS       AY851611                 881 bp    DNA     linear   PLN 10-APR-2007
DEFINITION  Opuntia polyacantha rpl16 gene, intron; chloroplast.
VERSION     AY851611.1  GI:57240071
SOURCE      chloroplast Opuntia polyacantha
  ORGANISM  Opuntia polyacantha
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 881)
  AUTHORS   Butterworth,C.A. and Wallace,R.S.
  TITLE     Molecular Phylogenetics of the Leafy Cactus Genus Pereskia
  JOURNAL   Syst. Bot. 30 (4), 800-808 (2005)
REFERENCE   2  (bases 1 to 881)
  AUTHORS   Butterworth,C.A. and Wallace,R.S.
  TITLE     Direct Submission
  JOURNAL   Submitted (10-DEC-2004) Desert Botanical Garden, 1201 North Galvin
            Parkway, Phoenix, AZ 85008, USA
FEATURES             Location/Qualifiers
     source          1..881
                     /organism="Opuntia polyacantha"
                     /mol_type="genomic DNA"
     gene            <1..>881
     intron          <1..>881
        1 cattaaagga gggggatgcg gataaatgga aaggcgaaag aaagaaaaaa atgaatctaa
       61 atgatatacg attccactat gtaaggtctt tgaatcatat cataaaagac aatgtaataa
      121 agcatgaata cagattcaca cataattatc tgatatgaat ctattcatag aaaaaagaaa
      181 aaagtaagag cctccggcca ataaagacta agagggttgg ctcaagaaca aagttcatta
      241 agggctccat tgtagaattc agacctaatc attaatcaag aagcgatggg aacgatgtaa
      301 tccatgaata cagaagattc aattgaaaaa gatcctaatg atcattggga aggatggcgg
      361 aacgaaccag agaccaattc atctattctg aaaagtgata aactaatcct ataaaactaa
      421 aatagatatt gaaagagtaa atattcgccc gcgaaaattc cttttttatt aaattgctca
      481 cattttattt tagcaatgca atctaataaa atatatctat acaaaaaaat atagacaaac
      541 tctatatata atatatttca aatttcctta tatatcctaa tataaaaata tctaataaat
      601 tagatgaata tcaaagaatc tattgattta gtgtattatt aaatgtatat cttaattcaa
      661 tattattatt ctattcattt tcaaatttag aatatattaa tctatatatt aatttagaat
      721 tctattctaa ttcgaattca atttttaaat attcatattc aattaaaatt gaaatttttt
      781 cattcgcgag gagccggatg agaagaaact ctcacgtccg gttactgtag tagaggtgga
      841 attaagaaaa aaccatcaac tataacccca aaagaaccag a

LOCUS       AF191661                 895 bp    DNA     linear   PLN 07-NOV-1999
DEFINITION  Opuntia kuehnrichiana rpl16 gene; chloroplast gene for chloroplast
            product, partial intron sequence.
VERSION     AF191661.1  GI:6273287
SOURCE      chloroplast Cumulopuntia sphaerica
  ORGANISM  Cumulopuntia sphaerica
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae;
            Pentapetalae; Caryophyllales; Cactineae; Cactaceae; Opuntioideae;
REFERENCE   1  (bases 1 to 895)
  AUTHORS   Dickie,S.L. and Wallace,R.S.
  TITLE     Phylogeny of the subfamily Opuntioideae (Cactaceae)
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 895)
  AUTHORS   Dickie,S.L. and Wallace,R.S.
  TITLE     Direct Submission
  JOURNAL   Submitted (28-SEP-1999) Botany, Iowa State University, 353 Bessey
            Hall, Ames, IA 50011-1020, USA
FEATURES             Location/Qualifiers
     source          1..895
                     /organism="Cumulopuntia sphaerica"
                     /mol_type="genomic DNA"
                     /note="subfamily Opuntioideae; synonym: Cumulopuntia
     gene            <1..>895
     intron          <1..>895
        1 tatacattaa agaaggggga tgcggataaa tggaaaggcg aaagaaagaa aaaaatgaat
       61 ctaaatgata tacgattcca ctatgtaagg tctttgaatc atatcataaa agacaatgta
      121 ataaagcatg aatacagatt cacacataat tatctgatat gaatctattc atagaaaaaa
      181 gaaaaaagta agagcctccg gccaataaag actaagaggg ttggctcaag aacaaagttc
      241 attaagagct ccattgtaga attcagacct aatcattaat caagaagcga tgggaacgat
      301 gtaatccatg aatacagaag attcaattga aaaagatcct atgatccatt gggaaggatg
      361 gcggaacgaa ccagagacca attcatctat tctgaaaagt gataaactaa tcctataaaa
      421 ctaaaataga tattgaaaga gtaaatattc gcccgcgaaa attccttttt tttttaaatt
      481 gctcatattt tattttagca atgcaatcta ataaaatata tctatacaaa aaaataaaga
      541 caaactatat atataatata tttcaaattt ccttatatat ccaaatataa aaatatctaa
      601 taaattagat gaatatcaaa gaatctattg atttagtgta ttattaaatg tatatcttaa
      661 ttcaatatta ttattctatt catttttatt cattttcaat tttataatat attaatctat
      721 atattaattt ataattctat tctaattcga attcaatttt taaatattca tattcaatta
      781 aaattgaaat tttttcattc gcgaggagcc ggatgagaag aaactctcat gtccggttct
      841 gtagtagaga tggaattaag aaaaaaccat caactataac cccaagagaa ccaga

In this case, we are just getting the raw records. To get the records in a more Python-friendly form, we can use Bio.SeqIO to parse the GenBank data into SeqRecord objects, including SeqFeature objects (see Chapter [chapter:Bio.SeqIO]):

In [79]:
from Bio import SeqIO
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="gb", retmode="text")
records = SeqIO.parse(handle, "gb")

We can now step through the records and look at the information we are interested in:

In [80]:
for record in records:
    print("%s, length %i, with %i features" \
    % (, len(record), len(record.features)))

HQ621368, length 399, with 3 features
HM041482, length 1197, with 3 features
HM041481, length 1200, with 3 features
HM041480, length 1153, with 3 features
HM041479, length 1197, with 3 features
HM041478, length 1187, with 3 features
HM041477, length 1197, with 3 features
HM041476, length 1205, with 3 features
HM041474, length 1163, with 3 features
HM041473, length 1203, with 3 features
HM041472, length 1182, with 3 features
HM041469, length 1189, with 3 features
HM041468, length 1202, with 3 features
HM041467, length 1199, with 3 features
HM041466, length 1205, with 3 features
HM041465, length 1190, with 3 features
HM041464, length 1184, with 3 features
AY851612, length 892, with 3 features
AY851611, length 881, with 3 features
AF191661, length 895, with 3 features

Using these automated query retrieval functionality is a big plus over doing things by hand. Although the module should obey the NCBI’s max three queries per second rule, the NCBI have other recommendations like avoiding peak hours. See Section [sec:entrez-guidelines]. In particular, please note that for simplicity, this example does not use the WebEnv history feature. You should use this for any non-trivial search and download work, see Section History and WebEnv.

Finally, if plan to repeat your analysis, rather than downloading the files from the NCBI and parsing them immediately (as shown in this example), you should just download the records once and save them to your hard disk, and then parse the local file.

Finding the lineage of an organism

Staying with a plant example, let’s now find the lineage of the Cypripedioideae orchid family. First, we search the Taxonomy database for Cypripedioideae, which yields exactly one NCBI taxonomy identifier:

In [81]:
from Bio import Entrez = ""     # Always tell NCBI who you are
handle = Entrez.esearch(db="Taxonomy", term="Cypripedioideae")
record =


In [82]:


Now, we use efetch to download this entry in the Taxonomy database, and then parse it:

In [83]:
handle = Entrez.efetch(db="Taxonomy", id="158330", retmode="xml")
records =

Again, this record stores lots of information:

In [84]:

dict_keys(['PubDate', 'ScientificName', 'Division', 'MitoGeneticCode', 'GeneticCode', 'CreateDate', 'Rank', 'ParentTaxId', 'LineageEx', 'TaxId', 'Lineage', 'UpdateDate', 'OtherNames'])

We can get the lineage directly from this record:

In [85]:

'cellular organisms; Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; Mesangiospermae; Liliopsida; Petrosaviidae; Asparagales; Orchidaceae'

The record data contains much more than just the information shown here

  • for example look under LineageEx instead of Lineage and you’ll get the NCBI taxon identifiers of the lineage entries too.

Using the history and WebEnv

Often you will want to make a series of linked queries. Most typically, running a search, perhaps refining the search, and then retrieving detailed search results. You can do this by making a series of separate calls to Entrez. However, the NCBI prefer you to take advantage of their history support - for example combining ESearch and EFetch.

Another typical use of the history support would be to combine EPost and EFetch. You use EPost to upload a list of identifiers, which starts a new history session. You then download the records with EFetch by referring to the session (instead of the identifiers).

Searching for and downloading sequences using the history

Suppose we want to search and download all the Opuntia rpl16 nucleotide sequences, and store them in a FASTA file. As shown in Section [sec:entrez-search-fetch-genbank], we can naively combine Bio.Entrez.esearch() to get a list of GI numbers, and then call Bio.Entrez.efetch() to download them all.

However, the approved approach is to run the search with the history feature. Then, we can fetch the results by reference to the search results - which the NCBI can anticipate and cache.

To do this, call Bio.Entrez.esearch() as normal, but with the additional argument of usehistory="y",

In [86]:
from Bio import Entrez = ""
search_handle = Entrez.esearch(db="nucleotide",term="Opuntia[orgn] and rpl16", usehistory="y")
search_results =

When you get the XML output back, it will still include the usual search results. However, you also get given two additional pieces of information, the WebEnv session cookie, and the QueryKey:

In [87]:
gi_list = search_results["IdList"]
count = int(search_results["Count"])
assert count == len(gi_list)
print("The WebEnv is {}".format(search_results["WebEnv"]))
print("The QueryKey is {}".format(search_results["QueryKey"]))

The WebEnv is NCID_1_946410500_130.14.18.34_9001_1452651901_1799213676_0MetA0_S_MegaStore_F_1
The QueryKey is 1

Having stored these values in variables session\_cookie and query\_key we can use them as parameters to Bio.Entrez.efetch() instead of giving the GI numbers as identifiers.

While for small searches you might be OK downloading everything at once, it is better to download in batches. You use the retstart and retmax parameters to specify which range of search results you want returned (starting entry using zero-based counting, and maximum number of results to return). Sometimes you will get intermittent errors from Entrez, HTTPError 5XX, we use a try except pause retry block to address this. For example,

from Bio import Entrez
import time
    from urllib.error import HTTPError  # for Python 3
except ImportError:
    from urllib2 import HTTPError  # for Python 2
batch_size = 3
out_handle = open("orchid_rpl16.fasta", "w")
for start in range(0, count, batch_size):
    end = min(count, start+batch_size)
    print("Going to download record %i to %i" % (start+1, end))
    attempt = 1
    while attempt <= 3:
            fetch_handle = Entrez.efetch(db="nucleotide", rettype="fasta", retmode="text",
                                         retstart=start, retmax=batch_size,
                                         webenv=webenv, query_key=query_key)
        except HTTPError as err:
            if 500 <= err.code <= 599:
                print("Received error from server %s" % err)
                print("Attempt %i of 3" % attempt)
                attempt += 1
    data =

For illustrative purposes, this example downloaded the FASTA records in batches of three. Unless you are downloading genomes or chromosomes, you would normally pick a larger batch size.

Searching for and downloading abstracts using the history

Here is another history example, searching for papers published in the last year about the Opuntia, and then downloading them into a file in MedLine format:

from Bio import Entrez
import time
    from urllib.error import HTTPError  # for Python 3
except ImportError:
    from urllib2 import HTTPError  # for Python 2 = ""
search_results ="pubmed",
                                            reldate=365, datetype="pdat",
count = int(search_results["Count"])
print("Found %i results" % count)

batch_size = 10
out_handle = open("recent_orchid_papers.txt", "w")
for start in range(0,count,batch_size):
    end = min(count, start+batch_size)
    print("Going to download record %i to %i" % (start+1, end))
    attempt = 1
    while attempt <= 3:
            fetch_handle = Entrez.efetch(db="pubmed",rettype="medline",
        except HTTPError as err:
            if 500 <= err.code <= 599:
                print("Received error from server %s" % err)
                print("Attempt %i of 3" % attempt)
                attempt += 1
    data =

At the time of writing, this gave 28 matches - but because this is a date dependent search, this will of course vary. As described in Section [subsec:entrez-and-medline] above, you can then use Bio.Medline to parse the saved records.

Back in Section [sec:elink] we mentioned ELink can be used to search for citations of a given paper. Unfortunately this only covers journals indexed for PubMed Central (doing it for all the journals in PubMed would mean a lot more work for the NIH). Let’s try this for the Biopython PDB parser paper, PubMed ID 14630660:

In [88]:
from Bio import Entrez = ""
pmid = "14630660"
results ="pubmed", db="pmc",
LinkName="pubmed_pmc_refs", from_uid=pmid))
pmc_ids = [link["Id"] for link in results[0]["LinkSetDb"][0]["Link"]]


Great - eleven articles. But why hasn’t the Biopython application note been found (PubMed ID 19304878)? Well, as you might have guessed from the variable names, there are not actually PubMed IDs, but PubMed Central IDs. Our application note is the third citing paper in that list, PMCID 2682512.

So, what if (like me) you’d rather get back a list of PubMed IDs? Well we can call ELink again to translate them. This becomes a two step process, so by now you should expect to use the history feature to accomplish it (Section History and WebEnv).

But first, taking the more straightforward approach of making a second (separate) call to ELink:

In [89]:
results2 ="pmc", db="pubmed", LinkName="pmc_pubmed",
pubmed_ids = [link["Id"] for link in results2[0]["LinkSetDb"][0]["Link"]]


In [ ]: